Factual Demonstration

Collective training without exposing patients.

Four hospitals jointly train a sepsis-prediction model — without any of them seeing the others' data, and without the aggregator seeing individual gradients.

Scenario

4 hospitals want to jointly train a sepsis-prediction model. Each has 50 of its own patients. The more data, the better the model.

Why plain FL isn't enough

Federated Learning solves "data stays local". But research shows gradients leak data: a patient can be reconstructed from a gradient.

FL + FHE

Gradients are encrypted before leaving. The aggregator sums them under encryption. Only the sum is decrypted. Individual reconstruction becomes mathematically impossible.

Step 01 · Setup

Define the parameters.

Before federated training, we choose CKKS parameters. For ML gradients, CKKS is the natural scheme.

Capacity · CKKS

8 192Slots/ciphertext

2Mult. depth

APPROX.Scheme type

~128 bitSecurity

What each term means

CKKS ("approximate" scheme) — FHE family for real numbers. Controlled approximation noise (~10⁻¹¹ error here). Standard scheme for federated ML — gradients are real-valued vectors.

8 192 slots — Each ciphertext is a vector of 8 192 values. A single ciphertext carries the gradient vector of a whole hospital. 4 hospitals = 4 ciphertexts added together.

Multiplicative depth 2 — How many chained multiplications. For federated aggregation (sum + divide by N), depth 2 is comfortable.

~128 bits of security — Industry standard. Breaking the key would require ~2¹²⁸ operations. Infeasible.

RLWE base — Ring Learning With Errors. Same problem behind ML-KEM/ML-DSA standardized by NIST as post-quantum cryptography.

Step 02 · Keys

Collaborative key.

In real production, the decryption key is split among the 4 hospitals via threshold cryptography. No single hospital can decrypt even the average gradient — only with quorum.

Who holds the shares

Albert Einstein — share 1/4

Sírio-Libanês — share 2/4

HCor — share 3/4

Oswaldo Cruz — share 4/4

Required quorum: 4 of 4. Without cooperation of all four, not even the average gradient is revealed.

Why threshold in this case

No trusted central operator — In traditional federated ML there is usually a "trusted aggregator" that centralizes everything. That is fragile: the aggregator can be breached, compromised, or simply dishonest. Threshold eliminates the single trusted operator.

Each hospital is a peer — The 4 hospitals are peer institutions; there is no hierarchy between them. Threshold fits naturally in horizontal collaborations between same-level institutions.

Distributed audit — Every decryption logs an auditable record from all 4 parties. No hospital can "run secret analyses" over the others' data.

Step 03 · Local training

Each hospital trains locally.

Each hospital runs a gradient descent step over its own 50 patients. The data never leaves the hospital. The training output is only a 4-number vector: the gradient.

What each hospital does

// For each local patient:
for patient in local_patients {
  // heart rate, blood pressure, lactate
  features := patient.X
  y_true := patient.sepsis

  // linear prediction with current weights
  y_pred := dot(w, features) + bias
  err := y_pred - y_true

  // accumulate gradient
  grad += err * features / N
}

// Result: 4-number vector

What is happening

Sepsis prediction — A simple linear model (regression over 3 clinical features: heart rate, blood pressure, lactate level). Sepsis is one of the diseases where early detection saves measurable lives.

Local training = local data — The hospital iterates over its patients inside its own system. Every error computation happens on-premises. Patients never leave the hospital at any moment.

Gradient = 4 numbers — The output of the entire training step is a tiny vector of 4 real values: the "direction" the model weights should move in. It is this derivative that leaves the hospital.

But here's the catch — In plain FL, this 4-number vector is sent in cleartext. Research has shown (gradient leakage attacks) that it is possible to reverse a gradient to reconstruct part of the dataset. That's why we have to encrypt before sending.

Step 04 · Encryption

Gradient encrypted before leaving.

Each hospital encrypts its 4-gradient vector with the consortium public key before any byte leaves. Only ciphertext crosses the internet.

What each hospital sends

// local cleartext gradient:
grad := [0.0234, -0.0117,
          0.0089, -0.0042]

// encryption (on hospital server)
pt := encoder.Encode(grad)
ct := encryptor.Encrypt(pt)

// send ONLY ct
send(ct)  // 768 KB

3.7 msPer hospital

768 KBCiphertext

Why encrypt 4 numbers?

Gradient leakage attacks — Research from IBM, Google and academia has shown: from cleartext gradients you can partially recover the training data that produced them. For small datasets like 50 patients with 3 features each, in some cases individual patients can be recovered at alarming quality.

That's why FHE — Encrypting the 4 numbers before sending is what makes FL truly federated. Without FHE, FL is a marketing promise — the data leaks through the gradients. With FHE, the promise becomes a theorem.

Acceptable overhead — 4 floats in cleartext = 32 bytes. Encrypted = 768 KB. Massive overhead, but: (1) executed only once per training round, (2) encryption takes a few milliseconds, (3) the alternative (plain FL) is not defensible.

Same public key — The 4 hospitals encrypt with the SAME public key (generated via threshold in step 2). That is what lets the aggregator sum the ciphertexts in the next step.

Step 05 · Aggregation

Summing gradients under encryption.

The aggregator receives the 4 ciphertexts and sums them using the homomorphic property of CKKS. Multiplies by 1/4 for the average. All encrypted, in 1 ms. The aggregator never sees any individual gradient.

The algorithm

// 1. sum the 4 ciphertexts
ctSum := evaluator.Add(cts[0], cts[1])
for i := 2; i < N; i++ {
  evaluator.Add(ctSum, cts[i])
}

// 2. multiply by 1/N (average)
evaluator.Mul(ctSum, 0.25)
evaluator.Rescale(ctSum)

// → ctSum now encodes:
// (grad₁+grad₂+grad₃+grad₄) / 4

1 msTotal time

What the aggregator knows

Knows it summed 4 things — It has a record that it received and processed 4 ciphertexts. Knows the time, size, and IDs of participating hospitals.

Does NOT know what it summed — It has no idea what the individual gradient values are. It does not even know whether Einstein's gradient was positive or negative. Not even the sign.

Does not even know the result — Even the encrypted average gradient is opaque to the aggregator. It can only return the ciphertext to the hospitals. Only they, with the key shares, can decrypt it.

1 ms for 4 hospitals — Real measured speed. In production with dozens of hospitals and thousands of parameters (larger models), time scales linearly: ~10 ms for 100 parties.

Step 06 · Decryption

Model collectively updated.

The 4 hospitals cooperate (4-of-4 quorum) to decrypt ONLY the average gradient. The 4 individual gradients remain mathematically inaccessible. They apply the average gradient to the weights to update the model.

Gradient application

// decrypted average gradient:
avg_grad := [0.0156, -0.0089,
                0.0067, -0.0028]

// SGD: w ← w − lr · grad
lr := 0.5
for i := range w {
  w[i] -= lr * avg_grad[i]
}

// → updated collective model
// next round begins

What this round produced

The model is now collective — The updated weights reflect the joint learning of all 4 hospitals. Each one benefits from the others' knowledge, without having seen any of their data.

Next round — In real production this cycle repeats dozens to hundreds of times until the model converges. Every round stays under encryption. Every round improves the model.

The model is public — The updated weights are shared among the 4 hospitals. Anyone with access to the model can use it to predict sepsis. But the model does not reveal any individual patient — only the aggregated learning.

The scientific gain — A collective model trained on 200 patients (4×50) is mathematically better than any model trained in isolation on 50.

Step 07 · Validation

Mathematical comparison.

To prove that the encrypted aggregation produced the same average gradient as a traditional plaintext aggregation, we compare them side by side.

FHE vs Plaintext

8 × 10⁻¹¹Max error

Comparison between the average gradient decrypted from FHE and the average gradient computed in cleartext over the same 4 vectors. Difference at the 11th decimal place.

Why this precision is enough

11 decimal places — For comparison: ML gradients typically have magnitude between 10⁻³ and 10⁻¹. A 10⁻¹¹ error is 8 orders of magnitude smaller than the signal. Negligible.

Convergence preserved — A model trained with FHE aggregation converges to the same final weights as one trained with plaintext aggregation. CKKS precision is more than enough for SGD.

CKKS is the right scheme — With BGV (exact scheme) we would have to encode gradients as integers — possible but inconvenient. CKKS works natively on reals with controlled precision.

Step 08 · Adversarial

Plain FL vs FL + FHE.

The difference between traditional FL and FL+FHE is not cosmetic. It is the difference between a fragile marketing promise and a verifiable mathematical guarantee.

Plain FL (no FHE)The aggregator sees individual gradients in cleartext. Research from IBM, Google and academia has shown (Deep Leakage from Gradients, NeurIPS 2019; Inverting Gradients, ICLR 2020): from a gradient it IS possible to reconstruct part of the dataset that produced it. For small datasets like 50 patients, reconstruction can reach alarming levels — the aggregator effectively "sees" patients it was never supposed to see.

That's why plain FL is inadequate for clinical data. The "privacy promise" is fragile because the leak happens through the very artifact FL was supposed to protect.

FL + FHE (this demo)The aggregator never sees an individual gradient. Only the aggregated sum — and even that encrypted. Reconstructing the original dataset from individual gradients becomes mathematically impossible because individual gradients NEVER EXIST in cleartext outside their origin hospital.

It is the only defensible architecture. For any federated ML application on clinical data (sepsis, oncology, medical imaging), FL+FHE is the technical standard that serious regulation will require.

Step 09 · Summary

Defensible federated ML.

In a few milliseconds, 4 hospitals jointly trained a sepsis-prediction model — without any of them seeing another's data, and without the aggregator seeing individual gradients.

The complete flow

Defined CKKS parameters for gradient vectors
Key split among 4 hospitals via threshold
Each hospital trained locally over its 50 patients
Encrypted local gradient before sending (3.7 ms)
Aggregator summed the 4 ciphertexts under encryption (1 ms)
Hospitals collectively decrypted the average gradient
Updated model reflects collective knowledge of the 200 patients

Real numbers

3.7 msEncryption per hospital

1 msEncrypted aggregation

768 KBPer ciphertext

10⁻¹¹Precision error

3 eBooks use this primitiveFHE_FARMACEUTICA_EBOOK (multicenter RWE) · FHE_HOSPITAIS_EBOOK (predictive models over a multi-hospital cohort) · FHE_BANCOS_EBOOK (collaborative scoring across institutions).

The core thesisFL is a fragile privacy promise — gradients leak data. FL+FHE turns the promise into a theorem. For any federated ML over clinical, financial or citizen data, FL+FHE is the technical standard that serious regulation will require in the next 5 years.