Computable Sovereignty.
100 families contribute to the census. IBGE produces an aggregated table. With no individual records exposed to anyone.
Scenario
IBGE runs the census. Each family contributes data encrypted locally. A central server computes per-region aggregates for targeting social programs.
Problem
Today IBGE stores cleartext data in its own data centers. Researcher access requires slow individual authorization. Permanent leakage risk.
Guarantee
Under FHE: data stays permanently encrypted. Aggregated tables are generated under encryption. Collective decryption requires IBGE + ANPD + AGU.
Define the parameters.
Before the encrypted census, we choose BGV parameters. Each is a trade-off between security, speed and capacity.
BGV Parameters
What each term means
BGV ("exact" scheme) — FHE family that operates over integers. No approximation noise. Census counts need absolute exactness — BGV is the right scheme.
8 192 slots — Each ciphertext is a vector of 8 192 values. In real census production, each slot can encode a family or a region. A single operation processes them all.
Multiplicative depth 2 — How many chained multiplications the ciphertext supports. For census sums, low depth suffices — summing is cheaper than multiplying.
~128 bits of security — Industry standard. Breaking the key would require ~2128 operations — astronomical, infeasible even on quantum hardware.
RLWE base — Ring Learning With Errors. Same problem behind ML-KEM/ML-DSA standardized by NIST as post-quantum cryptography.
National key split.
In real production, the decryption key does not belong to a single entity. It is mathematically split among multiple institutions, each with its own constitutional role.
Who holds the shares
IBGE — technical census operator. Holds share 1. Alone, cannot decrypt anything.
ANPD — national data protection authority. Holds share 2. Ensures usage complies with LGPD.
AGU — Federal Attorney General. Holds share 3. Legal custodian of the operation.
Required quorum: 3 of 3. Without authorization from all three, the result remains mathematically inaccessible.
What "threshold cryptography" is
Key split, not copied — The secret key that decrypts the result is mathematically split into N pieces (shares) using techniques like Shamir Secret Sharing. No single share carries useful information about the whole key — like handing out 3 puzzle pieces where no single piece reveals the image.
Collective decryption — To decrypt, each party generates a "decryption share" (without revealing its key piece). The shares are aggregated mathematically and only then does the result appear in cleartext. If a party is missing, the aggregate is useless.
k-of-n (quorum) — In real production you can configure "3-of-5" or "4-of-7" for redundancy: if up to k-1 parties are unavailable, the system still works. Here we use 3-of-3 (all mandatory) for simplicity.
Triple guarantee — Technical (mathematical), legal (LGPD via ANPD), political (public audit via AGU). Every decryption leaves an irrefutable trail of the three authorizations.
Families contribute encrypted.
Each family answers the census on its own device (app, terminal, census taker). The data is encrypted before any byte leaves the home. ONLY the ciphertext travels to IBGE.
Concrete example · Family X
Cleartext data (on the phone):
household_income: R$ 850/month
poverty_line : YES
num_people : 4
Binary vector generated locally:
total_by_region = [0,1,0,0,0]
↑
Northeast = 1
Local encryption: the vector is encrypted with the consortium public key (generated during the threshold phase). Only a ~384 KB ciphertext leaves the home.
What each piece means
Characteristic (one-hot) vector — Instead of sending "I am in the Northeast and I am poor", the family sends a binary vector where each position represents a region and the value is 1 or 0. This lets IBGE simply sum the vectors to get totals per region.
Encryption before sending — The crucial difference vs a traditional census: the vector is encrypted inside the home, before it travels. At no point does a cleartext copy exist outside the family's device.
3 ms per dataset — Encrypting an 8 192-slot vector takes a few milliseconds on a commodity smartphone. Speed is not a bottleneck.
100 families in the demo — Didactic version. In real IBGE production, millions of families contribute in parallel. The pattern scales because each encryption is independent.
What reaches the server.
The ciphertexts travel over the internet to IBGE — and they can even be hosted on a foreign cloud (AWS, Azure, GCP) without any issue. The reason is mathematical.
What the server sees
Incoming bytes:
74 65 78 74 4d 65 74 61
44 61 74 61 22 3a 7b 22
53 63 61 6c 65 22 3a 7b
...
These bytes are pseudo-random — indistinguishable from noise to any observer without the secret key.
Why a foreign cloud can host it
US CLOUD Act — A 2018 US law that compels American companies (including AWS, Azure, GCP) to hand over customer data under US court order, even if the data sits physically outside the US. It is Brazil's biggest digital-sovereignty worry today.
Why FHE neutralizes it — The cloud only stores ciphertext. It has no key (the key is distributed in Brazil among IBGE+ANPD+AGU). Even under US court order compelling it to hand over everything, it would hand over pseudo-random bytes. Nobody — not even the US judge — can extract useful information.
Data sovereignty through math — Instead of migrating everything to a national cloud (expensive, slow, limited), Brazil can use global cloud infrastructure without giving up sovereignty. It is exactly the kind of move serious digital-sovereignty countries (France via SecNumCloud, Germany via Gaia-X) are pursuing.
Server sums under encryption.
The central server receives ciphertexts from every family and aggregates them — without ever decrypting any of them. The core operation is literally an encrypted sum.
The algorithm · pseudo-code
ct_accum := ct[0]
// 2. Sum them all under encryption
for i := 1; i < N; i++ {
ct_accum := evaluator.Add(
ct_accum, ct[i],
)
}
// 3. Result: ct_total_by_region
// no decryption up to this point
What is happening
Homomorphic sum — The property that gives "FHE" (Fully Homomorphic Encryption) its name is exactly this: the sum of two ciphertexts decrypts to the sum of the plaintexts. Dec(ct₁ + ct₂) = m₁ + m₂. The server runs the operation without ever seeing the numbers.
Slot-wise — Each ciphertext has 8 192 slots (5 of them active for the 5 regions). The sum is performed slot by slot. Slot 0 (North) only accumulates 1's from Northern families. Slot 1 (Northeast) only from Northeast families. And so on. All in parallel, in a single operation.
Why this demo is "trivial" — The didactic example arrives with the aggregates pre-summed to simplify the walkthrough. In real IBGE production, this is where the millions of sums happen — and it is the step that gains most from slot batching.
No decryption — At no point does the server touch cleartext values. It only knows it has two "opaque objects" and that the Add operation returns a third opaque object. The math guarantees that this third object, once decrypted by the quorum, will be the correct sum.
The public table.
After the quorum (IBGE + ANPD + AGU) authorizes decryption, the aggregated result is revealed. ONLY the per-region aggregate — no individual record.
Decrypted result
| Region | Poor | Total | % Poverty |
|---|---|---|---|
| North | 4 | 13 | 30.8% |
| Northeast | 10 | 21 | 47.6% |
| Midwest | 3 | 15 | 20.0% |
| Southeast | 5 | 30 | 16.7% |
| South | 4 | 21 | 19.0% |
How to read this table
Poor — how many families in that region are below the poverty line (monthly income < R$ 218 per capita).
Total — how many sample families are in that region.
% Poverty — relative poverty rate, computed after decryption (not under encryption). It is the number that feeds social-program targeting.
Expected distribution — Northeast leads (47.6%), Southeast is lowest (16.7%). Consistent with real IBGE data — the demo generated the synthetic data calibrated to that distribution.
What is NOT here — no individual family shows up. No taxpayer ID. No address. No specific income. Only the 5 aggregated counts and the 5 totals.
Mathematical validation.
To prove that the encrypted computation produced the same result as a traditional plaintext computation, we compare them side by side.
FHE × plaintext comparison
| Region | FHE | Cleartext | Error |
|---|---|---|---|
| North | 4 | 4 | 0 |
| Northeast | 10 | 10 | 0 |
| Midwest | 3 | 3 | 0 |
| Southeast | 5 | 5 | 0 |
| South | 4 | 4 | 0 |
Why it is exact
BGV is an "exact" scheme — Unlike CKKS (which operates on approximate real numbers), BGV operates on integers modulo a prime. No approximation noise. The result of the encrypted sum is always bit-identical to the plaintext sum.
Why this matters for a census — In public statistics, "31% poor in the North" and "30.8% poor in the North" are different numbers. Precision must be absolute. BGV guarantees it. For hospital benchmarking or federated ML, approximate CKKS is enough. For census, we demand exact BGV.
Public audit possible — Because the result is mathematically verifiable (FHE introduces no randomness in the final result), any independent auditor can rerun the same operation over the same ciphertexts and obtain the same aggregate. Reproducibility is part of the guarantee.
Three-layer defense.
Let's imagine three different adversaries trying to extract individual data. Each one fails for a different reason — and that redundancy is the defense.
Adversary 1 — Foreign government (CLOUD Act)
Attack: a US judge orders AWS/Azure/GCP to hand over all Brazilian data hosted on their servers.
Defense: the cloud only has ciphertext. It hands over pseudo-random bytes. Without the key (distributed in Brazil among IBGE+ANPD+AGU), nothing is decryptable. Even the NSA with a classical supercomputer would need ~2128 operations to attempt to break a single key.
Adversary 2 — Dishonest IBGE employee
Attack: an employee with access to the IBGE share tries to decrypt individual data.
Defense: they only have 1 of the 3 required shares. Alone, their share is mathematically useless. To decrypt they would have to convince ANPD and AGU to collaborate — three simultaneous institutional decisions, with audit.
Adversary 3 — Researcher with public datasets
Attack: a legitimate researcher tries to cross-reference the aggregated table with public datasets (electoral rolls, CNPJ, OpenStreetMap) to re-identify specific families.
Defense: the aggregated table with K-anonymity ≥ 10 per cell is resistant — each cell represents at least 10 indistinguishable families. For extra hardening, differential privacy (calibrated Gaussian noise) can be applied before publishing.
Why three layers?
Defense in depth — The strength doesn't come from a single "perfect" barrier, but from three independent barriers that would all need to be broken simultaneously. Breaking the math (RLWE) is infeasible. Breaking the key distribution requires a triple institutional collusion. Breaking anonymization requires re-identification against a DP-protected aggregate.
It is not "trust in IBGE" — The defense is deliberately built to survive a malicious IBGE. That is the point of FHE + threshold: trust nobody, and still work.
Sovereign State by design.
What this demo proves, in a few sentences: it is possible to run a public census without any single institution (not even IBGE) holding unilateral access to individual records.
The complete flow
- Defined BGV parameters (exact scheme, ideal for counting)
- Key split into 3 shares (IBGE, ANPD, AGU)
- 100 families locally encrypted their data as binary vectors
- Ciphertexts traveled (national or foreign cloud, doesn't matter)
- Server summed the ciphertexts under encryption (homomorphic Add)
- 3-of-3 quorum authorized decryption
- Per-region aggregated table revealed — only 5 counts
- Validation confirmed exact match with plaintext