Collective training without exposing patients.
Four hospitals jointly train a sepsis-prediction model — without any of them seeing the others' data, and without the aggregator seeing individual gradients.
Scenario
4 hospitals want to jointly train a sepsis-prediction model. Each has 50 of its own patients. The more data, the better the model.
Why plain FL isn't enough
Federated Learning solves "data stays local". But research shows gradients leak data: a patient can be reconstructed from a gradient.
FL + FHE
Gradients are encrypted before leaving. The aggregator sums them under encryption. Only the sum is decrypted. Individual reconstruction becomes mathematically impossible.
Define the parameters.
Before federated training, we choose CKKS parameters. For ML gradients, CKKS is the natural scheme.
Capacity · CKKS
What each term means
CKKS ("approximate" scheme) — FHE family for real numbers. Controlled approximation noise (~10⁻¹¹ error here). Standard scheme for federated ML — gradients are real-valued vectors.
8 192 slots — Each ciphertext is a vector of 8 192 values. A single ciphertext carries the gradient vector of a whole hospital. 4 hospitals = 4 ciphertexts added together.
Multiplicative depth 2 — How many chained multiplications. For federated aggregation (sum + divide by N), depth 2 is comfortable.
~128 bits of security — Industry standard. Breaking the key would require ~2128 operations. Infeasible.
RLWE base — Ring Learning With Errors. Same problem behind ML-KEM/ML-DSA standardized by NIST as post-quantum cryptography.
Collaborative key.
In real production, the decryption key is split among the 4 hospitals via threshold cryptography. No single hospital can decrypt even the average gradient — only with quorum.
Who holds the shares
Albert Einstein — share 1/4
Sírio-Libanês — share 2/4
HCor — share 3/4
Oswaldo Cruz — share 4/4
Required quorum: 4 of 4. Without cooperation of all four, not even the average gradient is revealed.
Why threshold in this case
No trusted central operator — In traditional federated ML there is usually a "trusted aggregator" that centralizes everything. That is fragile: the aggregator can be breached, compromised, or simply dishonest. Threshold eliminates the single trusted operator.
Each hospital is a peer — The 4 hospitals are peer institutions; there is no hierarchy between them. Threshold fits naturally in horizontal collaborations between same-level institutions.
Distributed audit — Every decryption logs an auditable record from all 4 parties. No hospital can "run secret analyses" over the others' data.
Each hospital trains locally.
Each hospital runs a gradient descent step over its own 50 patients. The data never leaves the hospital. The training output is only a 4-number vector: the gradient.
What each hospital does
for patient in local_patients {
// heart rate, blood pressure, lactate
features := patient.X
y_true := patient.sepsis
// linear prediction with current weights
y_pred := dot(w, features) + bias
err := y_pred - y_true
// accumulate gradient
grad += err * features / N
}
// Result: 4-number vector
What is happening
Sepsis prediction — A simple linear model (regression over 3 clinical features: heart rate, blood pressure, lactate level). Sepsis is one of the diseases where early detection saves measurable lives.
Local training = local data — The hospital iterates over its patients inside its own system. Every error computation happens on-premises. Patients never leave the hospital at any moment.
Gradient = 4 numbers — The output of the entire training step is a tiny vector of 4 real values: the "direction" the model weights should move in. It is this derivative that leaves the hospital.
But here's the catch — In plain FL, this 4-number vector is sent in cleartext. Research has shown (gradient leakage attacks) that it is possible to reverse a gradient to reconstruct part of the dataset. That's why we have to encrypt before sending.
Gradient encrypted before leaving.
Each hospital encrypts its 4-gradient vector with the consortium public key before any byte leaves. Only ciphertext crosses the internet.
What each hospital sends
grad := [0.0234, -0.0117,
0.0089, -0.0042]
// encryption (on hospital server)
pt := encoder.Encode(grad)
ct := encryptor.Encrypt(pt)
// send ONLY ct
send(ct) // 768 KB
Why encrypt 4 numbers?
Gradient leakage attacks — Research from IBM, Google and academia has shown: from cleartext gradients you can partially recover the training data that produced them. For small datasets like 50 patients with 3 features each, in some cases individual patients can be recovered at alarming quality.
That's why FHE — Encrypting the 4 numbers before sending is what makes FL truly federated. Without FHE, FL is a marketing promise — the data leaks through the gradients. With FHE, the promise becomes a theorem.
Acceptable overhead — 4 floats in cleartext = 32 bytes. Encrypted = 768 KB. Massive overhead, but: (1) executed only once per training round, (2) encryption takes a few milliseconds, (3) the alternative (plain FL) is not defensible.
Same public key — The 4 hospitals encrypt with the SAME public key (generated via threshold in step 2). That is what lets the aggregator sum the ciphertexts in the next step.
Summing gradients under encryption.
The aggregator receives the 4 ciphertexts and sums them using the homomorphic property of CKKS. Multiplies by 1/4 for the average. All encrypted, in 1 ms. The aggregator never sees any individual gradient.
The algorithm
ctSum := evaluator.Add(cts[0], cts[1])
for i := 2; i < N; i++ {
evaluator.Add(ctSum, cts[i])
}
// 2. multiply by 1/N (average)
evaluator.Mul(ctSum, 0.25)
evaluator.Rescale(ctSum)
// → ctSum now encodes:
// (grad₁+grad₂+grad₃+grad₄) / 4
What the aggregator knows
Knows it summed 4 things — It has a record that it received and processed 4 ciphertexts. Knows the time, size, and IDs of participating hospitals.
Does NOT know what it summed — It has no idea what the individual gradient values are. It does not even know whether Einstein's gradient was positive or negative. Not even the sign.
Does not even know the result — Even the encrypted average gradient is opaque to the aggregator. It can only return the ciphertext to the hospitals. Only they, with the key shares, can decrypt it.
1 ms for 4 hospitals — Real measured speed. In production with dozens of hospitals and thousands of parameters (larger models), time scales linearly: ~10 ms for 100 parties.
Model collectively updated.
The 4 hospitals cooperate (4-of-4 quorum) to decrypt ONLY the average gradient. The 4 individual gradients remain mathematically inaccessible. They apply the average gradient to the weights to update the model.
Gradient application
avg_grad := [0.0156, -0.0089,
0.0067, -0.0028]
// SGD: w ← w − lr · grad
lr := 0.5
for i := range w {
w[i] -= lr * avg_grad[i]
}
// → updated collective model
// next round begins
What this round produced
The model is now collective — The updated weights reflect the joint learning of all 4 hospitals. Each one benefits from the others' knowledge, without having seen any of their data.
Next round — In real production this cycle repeats dozens to hundreds of times until the model converges. Every round stays under encryption. Every round improves the model.
The model is public — The updated weights are shared among the 4 hospitals. Anyone with access to the model can use it to predict sepsis. But the model does not reveal any individual patient — only the aggregated learning.
The scientific gain — A collective model trained on 200 patients (4×50) is mathematically better than any model trained in isolation on 50.
Mathematical comparison.
To prove that the encrypted aggregation produced the same average gradient as a traditional plaintext aggregation, we compare them side by side.
FHE vs Plaintext
Comparison between the average gradient decrypted from FHE and the average gradient computed in cleartext over the same 4 vectors. Difference at the 11th decimal place.
Why this precision is enough
11 decimal places — For comparison: ML gradients typically have magnitude between 10⁻³ and 10⁻¹. A 10⁻¹¹ error is 8 orders of magnitude smaller than the signal. Negligible.
Convergence preserved — A model trained with FHE aggregation converges to the same final weights as one trained with plaintext aggregation. CKKS precision is more than enough for SGD.
CKKS is the right scheme — With BGV (exact scheme) we would have to encode gradients as integers — possible but inconvenient. CKKS works natively on reals with controlled precision.
Plain FL vs FL + FHE.
The difference between traditional FL and FL+FHE is not cosmetic. It is the difference between a fragile marketing promise and a verifiable mathematical guarantee.
That's why plain FL is inadequate for clinical data. The "privacy promise" is fragile because the leak happens through the very artifact FL was supposed to protect.
It is the only defensible architecture. For any federated ML application on clinical data (sepsis, oncology, medical imaging), FL+FHE is the technical standard that serious regulation will require.
Defensible federated ML.
In a few milliseconds, 4 hospitals jointly trained a sepsis-prediction model — without any of them seeing another's data, and without the aggregator seeing individual gradients.
The complete flow
- Defined CKKS parameters for gradient vectors
- Key split among 4 hospitals via threshold
- Each hospital trained locally over its 50 patients
- Encrypted local gradient before sending (3.7 ms)
- Aggregator summed the 4 ciphertexts under encryption (1 ms)
- Hospitals collectively decrypted the average gradient
- Updated model reflects collective knowledge of the 200 patients