Factual Demonstration

Computable Sovereignty.

100 families contribute to the census. IBGE produces an aggregated table. With no individual records exposed to anyone.

Scenario

IBGE runs the census. Each family contributes data encrypted locally. A central server computes per-region aggregates for targeting social programs.

Problem

Today IBGE stores cleartext data in its own data centers. Researcher access requires slow individual authorization. Permanent leakage risk.

Guarantee

Under FHE: data stays permanently encrypted. Aggregated tables are generated under encryption. Collective decryption requires IBGE + ANPD + AGU.

Step 01 · Setup

Define the parameters.

Before the encrypted census, we choose BGV parameters. Each is a trade-off between security, speed and capacity.

BGV Parameters

8 192Slots/ciphertext

EXACTScheme type

2Mult. depth

~128 bitSecurity

What each term means

BGV ("exact" scheme) — FHE family that operates over integers. No approximation noise. Census counts need absolute exactness — BGV is the right scheme.

8 192 slots — Each ciphertext is a vector of 8 192 values. In real census production, each slot can encode a family or a region. A single operation processes them all.

Multiplicative depth 2 — How many chained multiplications the ciphertext supports. For census sums, low depth suffices — summing is cheaper than multiplying.

~128 bits of security — Industry standard. Breaking the key would require ~2¹²⁸ operations — astronomical, infeasible even on quantum hardware.

RLWE base — Ring Learning With Errors. Same problem behind ML-KEM/ML-DSA standardized by NIST as post-quantum cryptography.

Step 02 · Threshold

National key split.

In real production, the decryption key does not belong to a single entity. It is mathematically split among multiple institutions, each with its own constitutional role.

Who holds the shares

IBGE — technical census operator. Holds share 1. Alone, cannot decrypt anything.

ANPD — national data protection authority. Holds share 2. Ensures usage complies with LGPD.

AGU — Federal Attorney General. Holds share 3. Legal custodian of the operation.

Required quorum: 3 of 3. Without authorization from all three, the result remains mathematically inaccessible.

What "threshold cryptography" is

Key split, not copied — The secret key that decrypts the result is mathematically split into N pieces (shares) using techniques like Shamir Secret Sharing. No single share carries useful information about the whole key — like handing out 3 puzzle pieces where no single piece reveals the image.

Collective decryption — To decrypt, each party generates a "decryption share" (without revealing its key piece). The shares are aggregated mathematically and only then does the result appear in cleartext. If a party is missing, the aggregate is useless.

k-of-n (quorum) — In real production you can configure "3-of-5" or "4-of-7" for redundancy: if up to k-1 parties are unavailable, the system still works. Here we use 3-of-3 (all mandatory) for simplicity.

Triple guarantee — Technical (mathematical), legal (LGPD via ANPD), political (public audit via AGU). Every decryption leaves an irrefutable trail of the three authorizations.

Why this matters for the StateThreshold cryptography is what lets sensitive data be processed without any public agency alone — not even the technical operator — having unilateral access. It is distributed control by mathematical design, not by institutional good faith.

Step 03 · Collection

Families contribute encrypted.

Each family answers the census on its own device (app, terminal, census taker). The data is encrypted before any byte leaves the home. ONLY the ciphertext travels to IBGE.

Concrete example · Family X

Cleartext data (on the phone):

region         : Northeast
household_income: R$ 850/month
poverty_line   : YES
num_people     : 4

Binary vector generated locally:

poor_by_region   = [0,1,0,0,0]
total_by_region  = [0,1,0,0,0]
                  ↑
           Northeast = 1

Local encryption: the vector is encrypted with the consortium public key (generated during the threshold phase). Only a ~384 KB ciphertext leaves the home.

What each piece means

Characteristic (one-hot) vector — Instead of sending "I am in the Northeast and I am poor", the family sends a binary vector where each position represents a region and the value is 1 or 0. This lets IBGE simply sum the vectors to get totals per region.

Encryption before sending — The crucial difference vs a traditional census: the vector is encrypted inside the home, before it travels. At no point does a cleartext copy exist outside the family's device.

3 ms per dataset — Encrypting an 8 192-slot vector takes a few milliseconds on a commodity smartphone. Speed is not a bottleneck.

100 families in the demo — Didactic version. In real IBGE production, millions of families contribute in parallel. The pattern scales because each encryption is independent.

Step 04 · Transit

What reaches the server.

The ciphertexts travel over the internet to IBGE — and they can even be hosted on a foreign cloud (AWS, Azure, GCP) without any issue. The reason is mathematical.

What the server sees

Incoming bytes:

7b 22 50 6c 61 69 6e
65 78 74 4d 65 74 61
61 74 61 22 3a 7b 22
63 61 6c 65 22 3a 7b
...

~770 KBTotal sent (2 ciphertexts)

These bytes are pseudo-random — indistinguishable from noise to any observer without the secret key.

Why a foreign cloud can host it

US CLOUD Act — A 2018 US law that compels American companies (including AWS, Azure, GCP) to hand over customer data under US court order, even if the data sits physically outside the US. It is Brazil's biggest digital-sovereignty worry today.

Why FHE neutralizes it — The cloud only stores ciphertext. It has no key (the key is distributed in Brazil among IBGE+ANPD+AGU). Even under US court order compelling it to hand over everything, it would hand over pseudo-random bytes. Nobody — not even the US judge — can extract useful information.

Data sovereignty through math — Instead of migrating everything to a national cloud (expensive, slow, limited), Brazil can use global cloud infrastructure without giving up sovereignty. It is exactly the kind of move serious digital-sovereignty countries (France via SecNumCloud, Germany via Gaia-X) are pursuing.

Step 05 · Aggregation

Server sums under encryption.

The central server receives ciphertexts from every family and aggregates them — without ever decrypting any of them. The core operation is literally an encrypted sum.

The algorithm · pseudo-code

// 1. Receive N family ciphertexts
ct_accum := ct[0]

// 2. Sum them all under encryption
for i := 1; i < N; i++ {
  ct_accum := evaluator.Add(
    ct_accum, ct[i],
  )
}

// 3. Result: ct_total_by_region
// no decryption up to this point

What is happening

Homomorphic sum — The property that gives "FHE" (Fully Homomorphic Encryption) its name is exactly this: the sum of two ciphertexts decrypts to the sum of the plaintexts. Dec(ct₁ + ct₂) = m₁ + m₂. The server runs the operation without ever seeing the numbers.

Slot-wise — Each ciphertext has 8 192 slots (5 of them active for the 5 regions). The sum is performed slot by slot. Slot 0 (North) only accumulates 1's from Northern families. Slot 1 (Northeast) only from Northeast families. And so on. All in parallel, in a single operation.

Why this demo is "trivial" — The didactic example arrives with the aggregates pre-summed to simplify the walkthrough. In real IBGE production, this is where the millions of sums happen — and it is the step that gains most from slot batching.

No decryption — At no point does the server touch cleartext values. It only knows it has two "opaque objects" and that the Add operation returns a third opaque object. The math guarantees that this third object, once decrypted by the quorum, will be the correct sum.

Step 06 · Decryption

The public table.

After the quorum (IBGE + ANPD + AGU) authorizes decryption, the aggregated result is revealed. ONLY the per-region aggregate — no individual record.

Decrypted result

Region	Poor	Total	% Poverty
North	4	13	30.8%
Northeast	10	21	47.6%
Midwest	3	15	20.0%
Southeast	5	30	16.7%
South	4	21	19.0%

How to read this table

Poor — how many families in that region are below the poverty line (monthly income < R$ 218 per capita).

Total — how many sample families are in that region.

% Poverty — relative poverty rate, computed after decryption (not under encryption). It is the number that feeds social-program targeting.

Expected distribution — Northeast leads (47.6%), Southeast is lowest (16.7%). Consistent with real IBGE data — the demo generated the synthetic data calibrated to that distribution.

What is NOT here — no individual family shows up. No taxpayer ID. No address. No specific income. Only the 5 aggregated counts and the 5 totals.

This table is the census's final productIt is what becomes public policy — Bolsa Família, BPC, regional targeting. And it is all that leaves the system. The 100 individual records remain mathematically inaccessible forever.

Step 07 · Proof

Mathematical validation.

To prove that the encrypted computation produced the same result as a traditional plaintext computation, we compare them side by side.

FHE × plaintext comparison

Region	FHE	Cleartext
North	4	4
Northeast	10	10
Midwest	3	3
Southeast	5	5
South	4	4

5 / 5Regions with exact match

Why it is exact

BGV is an "exact" scheme — Unlike CKKS (which operates on approximate real numbers), BGV operates on integers modulo a prime. No approximation noise. The result of the encrypted sum is always bit-identical to the plaintext sum.

Why this matters for a census — In public statistics, "31% poor in the North" and "30.8% poor in the North" are different numbers. Precision must be absolute. BGV guarantees it. For hospital benchmarking or federated ML, approximate CKKS is enough. For census, we demand exact BGV.

Public audit possible — Because the result is mathematically verifiable (FHE introduces no randomness in the final result), any independent auditor can rerun the same operation over the same ciphertexts and obtain the same aggregate. Reproducibility is part of the guarantee.

Step 08 · Adversarial

Three-layer defense.

Let's imagine three different adversaries trying to extract individual data. Each one fails for a different reason — and that redundancy is the defense.

Adversary 1 — Foreign government (CLOUD Act)

Attack: a US judge orders AWS/Azure/GCP to hand over all Brazilian data hosted on their servers.

Defense: the cloud only has ciphertext. It hands over pseudo-random bytes. Without the key (distributed in Brazil among IBGE+ANPD+AGU), nothing is decryptable. Even the NSA with a classical supercomputer would need ~2¹²⁸ operations to attempt to break a single key.

Adversary 2 — Dishonest IBGE employee

Attack: an employee with access to the IBGE share tries to decrypt individual data.

Defense: they only have 1 of the 3 required shares. Alone, their share is mathematically useless. To decrypt they would have to convince ANPD and AGU to collaborate — three simultaneous institutional decisions, with audit.

Adversary 3 — Researcher with public datasets

Attack: a legitimate researcher tries to cross-reference the aggregated table with public datasets (electoral rolls, CNPJ, OpenStreetMap) to re-identify specific families.

Defense: the aggregated table with K-anonymity ≥ 10 per cell is resistant — each cell represents at least 10 indistinguishable families. For extra hardening, differential privacy (calibrated Gaussian noise) can be applied before publishing.

Why three layers?

Defense in depth — The strength doesn't come from a single "perfect" barrier, but from three independent barriers that would all need to be broken simultaneously. Breaking the math (RLWE) is infeasible. Breaking the key distribution requires a triple institutional collusion. Breaking anonymization requires re-identification against a DP-protected aggregate.

It is not "trust in IBGE" — The defense is deliberately built to survive a malicious IBGE. That is the point of FHE + threshold: trust nobody, and still work.

Step 09 · Summary

Sovereign State by design.

What this demo proves, in a few sentences: it is possible to run a public census without any single institution (not even IBGE) holding unilateral access to individual records.

The complete flow

Defined BGV parameters (exact scheme, ideal for counting)
Key split into 3 shares (IBGE, ANPD, AGU)
100 families locally encrypted their data as binary vectors
Ciphertexts traveled (national or foreign cloud, doesn't matter)
Server summed the ciphertexts under encryption (homomorphic Add)
3-of-3 quorum authorized decryption
Per-region aggregated table revealed — only 5 counts
Validation confirmed exact match with plaintext

The real numbers

3 msEncryption per family

384 KBPer ciphertext

5 / 5Correct regions

EXACTPrecision (BGV)

What this unlocks for the Brazilian StateDecentralized census · social-program targeting without an exposed central registry · mathematical digital sovereignty · automatic compliance with LGPD art. 11 · use of foreign cloud without sovereignty loss · institutional traceability via threshold.

The core thesisTraditional data sovereignty relies on "trust the operator". Cryptographic-design sovereignty relies on trusting no one — it works even if the operator is malicious. It is the only defensible long-term architecture for census, social targeting and any state operation over citizen data.

Extra layer · Differential PrivacySince 2020, the US Census Bureau adopts Differential Privacy as a mandatory layer before publishing aggregates (a decision motivated by reconstruction attacks against 2010 data). IBGE has not officially adopted DP yet, but it is best practice. FHE + threshold + DP form the complete architecture: FHE guarantees nobody saw individual data during processing; threshold distributes institutional control; DP adds calibrated Gaussian noise to the final result to block reconstruction attacks even with auxiliary knowledge.