degauss

Find your digital footprint, measure your exposure in bits, and systematically reduce it. Scans 35+ platforms for your accounts, verifies which are really yours, filters false positives, and gives you direct links to delete or privatise each one. Generates legal removal requests under GDPR/CCPA/UK DPA. Maps the data broker supply chain to find where your data originates.

What it does

$ degauss discover --username giuseppe552 --name "Giuseppe Giona" --email g@test.com

Discovering your digital footprint
═══════════════════════════════════════════════════

[1] Finding accounts: giuseppe552
    8 potential accounts found. Verifying...

  DELETE these accounts:
    ● Reddit (forum) — https://www.reddit.com/user/giuseppe552
      → https://www.reddit.com/settings/account
      Settings → Account → Delete account

  PRIVATISE these accounts:
    ● Steam (gaming) — https://steamcommunity.com/id/giuseppe552
      → https://steamcommunity.com/my/edit/settings
      Edit Profile → Privacy Settings → set everything to Private

  FALSE POSITIVES (5 filtered out): Instagram, TikTok, Pinterest, YouTube, PyPI

  Estimated time to clean up: 9 minutes
  34 platforms checked → 8 found → 3 confirmed

Not predictions. Not models. Real accounts, verified, with direct action links.

Sweeney (2000) showed {ZIP, DOB, sex} uniquely identifies 87% of Americans — ~31.6 bits against a 28.3-bit population. degauss measures your actual exposure in bits and computes the optimal reduction strategy.

Run it

git clone https://github.com/Giuseppe552/degauss.git && cd degauss
npm install && npm run build

Find your footprint (works now, no setup)

# discover accounts, code leaks, and breaches
degauss discover --username your-handle --name "Your Name" --email you@mail.com

# predict which data brokers have your data (instant, no network)
degauss predict --name "Your Name" --country US

# interactive profile builder (4 questions)
degauss init

One command does everything

# scan brokers → score exposure → analyse attacks → map supply chain → build removal plan
degauss me --name "Jane Doe" --city Portland --state OR --email jane@mail.com

# or with a manually-built profile (when automated scanning is blocked)
degauss me --name "Jane Doe" --profile my-exposure.json --email jane@mail.com

Scanning limitation: Most data brokers use Cloudflare to block Tor exit nodes and automated scrapers. The automated scan will attempt all targets through Tor (your IP stays hidden) but may return 0 results. In that case, build your profile manually — search the brokers in an incognito window, note what you find, and feed it to degauss as JSON. The scoring, attack analysis, and removal planning work regardless of how the profile was built.

Output:

[1/5] Scanning data brokers...
  Found on 3 of 6 brokers (11 QIs extracted)

[2/5] Computing exposure score...
  Exposure: 42.3 bits (threshold: 28.3)
  Anonymity set: 1
  YOU ARE UNIQUELY IDENTIFIABLE

[3/5] Analysing attack surface...
  Feasible attacks: 7 (3 critical)
    CRITICAL Bank account phone takeover (95% feasible)
    CRITICAL SIM swap attack (92% feasible)
    HIGH     Spear phishing via employer context (85% feasible)

[4/5] Mapping data supply chain...
  2 upstream removal(s) would cascade to 3 sources
    Acxiom → cascades to spokeo, beenverified

[5/5] Building removal plan...
  1. Remove from spokeo (uk_dpa) — -12.3 bits
  2. Remove from whitepages (uk_dpa) — -8.7 bits

Or step by step

# compute your exposure score
degauss score --profile profile.json --country UK

# optimal removal plan (what to remove first)
node apps/cli/dist/index.js plan --profile profile.json

# generate a UK GDPR Article 17 erasure request
node apps/cli/dist/index.js request --source spokeo --fields full_name,email,phone \
  --country UK --name "Your Name" --email "you@example.com"

# DMCA takedown for your photos
node apps/cli/dist/index.js dmca --source spokeo --photo-url https://... \
  --name "Your Name" --email "you@example.com"

# predict re-emergence after removal
node apps/cli/dist/index.js monitor --sources spokeo,whitepages,radaris

# check if two records refer to the same person (Fellegi-Sunter)
node apps/cli/dist/index.js linkage --record-a a.json --record-b b.json

# generate synthetic profiles for data dilution
node apps/cli/dist/index.js dilute --profile profile.json --count 20 --anchor full_name

Every command outputs JSON to stdout. Pipe into jq, Python, anything.

The maths

Established theory (standard information theory and graph algorithms):

Concept	Method	Reference
Exposure quantification	Shannon entropy, self-information per QI	Shannon (1948), Sweeney (2000)
Name frequencies	US Census 2010 surname data, SSA first names	census.gov
Uniqueness threshold	log₂(N) bits for population N	Golle (2006)
Anonymity set	2^H — effective group size from entropy	Díaz et al. (2002, PET)
Record linkage	Fellegi-Sunter log-likelihood ratios	Fellegi & Sunter (1969, JASA)
String matching	Jaro-Winkler similarity	Jaro (1989), Winkler (2006)
Max-flow / min-cut	Edmonds-Karp algorithm	Edmonds & Karp (1972, JACM)
Data dilution	Synthetic profiles for k-anonymity	Sweeney (2002), Howe & Nissenbaum (2009)
Legal requests	GDPR Art 17, UK DPA, CCPA §1798.105, DMCA §512(c)(3)	—

Our constructions (heuristic models, not peer-reviewed):

Concept	What it does	Caveat
Identity graph	Records as nodes, linking QIs as weighted edges	Novel framing — Fellegi-Sunter doesn't use graph models
Exposure via min-cut	Uses max-flow to estimate adversarial linkage power	Plausible heuristic, not a proven bound
Correlation damping	Pairwise ρ factors between QI fields	Simplified — real correlations are value-dependent
Re-emergence model	Exponential decay with broker-specific λ	Parameters are estimates, not calibrated against data
Removal ordering	Greedy by bits/difficulty	Assumes submodularity (not proven for this objective)

How the exposure score works

Each quasi-identifier (name, email, phone, ZIP, DOB) contributes bits of identifying information. A rare surname contributes more bits than a common one. An email is near-unique (~28 bits). Sex is ~1 bit.

The total exposure accounts for correlations between fields — ZIP and city are highly correlated (ρ=0.85), so they don't double-count. Full name subsumes first/last name.

When your total exceeds log₂(population), you're uniquely identifiable. The anonymity set = 2^(threshold - exposure). An anonymity set of 1 means you're singled out.

How the removal plan works

Your identity is modelled as a graph. Nodes = records on different sources. Edges = linking quasi-identifiers (shared email, shared phone+name, etc.), weighted by mutual information.

The adversary's re-identification power = max-flow through this graph. The optimal removal set = minimum vertex cut (max-flow min-cut theorem). Among equally effective removals, we prioritise by difficulty — a self-service opt-out form beats a notarised letter to LexisNexis.

The greedy ordering achieves at least 63% of optimal (submodular guarantee, Krause & Golovin 2014).

How data dilution works

When removal fails (public records, government data), dilution increases k-anonymity by adding statistically plausible records sharing the target's anchor fields (usually just name).

If there are 20 "Giuseppe Giona" records with different addresses, phones, and employers, the adversary's confidence drops to 1/20 — adding log₂(20) ≈ 4.3 bits of uncertainty.

Synthetic profiles are internally consistent (city matches ZIP, area code matches region) and varied enough to actually increase entropy.

Stack

packages/core/
  quantify/       entropy, census data, identity graph, exposure reports
  discovery/      broker scanning, supply chain graph, HIBP breaches, canary tokens
  strategy/       Fellegi-Sunter linkage, Jaro-Winkler, attack feasibility assessment
  legal/          GDPR, UK DPA, CCPA, DMCA request generation
  monitor/        re-emergence prediction, monitoring schedules
  dilution/       synthetic profile generation, k-anonymity computation

apps/cli/         20 commands, Tor SOCKS5h routing, state persistence

Data broker supply chain

Nobody else models the broker ecosystem as a directed graph. Brokers don't operate independently — Acxiom feeds Spokeo, LexisNexis feeds BeenVerified, public records feed everyone. Removing from a leaf broker is pointless if the upstream source still has the data.

degauss maps the known supply chain (21 nodes, 26 edges) and computes the optimal upstream removal strategy using weighted set cover. One upstream removal can cascade downstream, making multiple leaf removals unnecessary.

Attack feasibility assessment

Given your exposed data, degauss computes which attack vectors become feasible against you. 11 scenarios (SIM swap, bank pretexting, spear phishing, credential stuffing, doxxing, tax fraud, etc.), each showing what data the attacker needs, impact severity, and specific defensive mitigations. Turns abstract "bits of exposure" into concrete risks with actionable fixes.

Canary tokens

Offensive privacy. Generate unique URLs and email addresses, embed them in your broker profiles as "contact" links. When someone accesses a canary, you know someone is researching you — when, from where, and sometimes who.

Profile format

{
  "records": [
    {
      "source": "spokeo",
      "url": "https://spokeo.com/Your-Name",
      "qis": [
        { "field": "full_name", "value": "Your Name", "source": "spokeo" },
        { "field": "email", "value": "you@example.com", "source": "spokeo" },
        { "field": "phone", "value": "+447700123456", "source": "spokeo" }
      ],
      "discoveredAt": 1710700000000,
      "status": "active"
    }
  ]
}

Develop

npm install && npm run build
cd packages/core && npm test   # 303 tests

Origin

Four tools, one mathematical thread: measuring what an adversary can learn — about your documents (PDF Changer), your identity (threadr), your transactions (ε-tx), and now reducing it.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
apps/cli		apps/cli
examples		examples
packages/core		packages/core
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

degauss

What it does

Run it

Find your footprint (works now, no setup)

One command does everything

Or step by step

The maths

Stack

Data broker supply chain

Attack feasibility assessment

Canary tokens

Profile format

Develop

Origin

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

degauss

What it does

Run it

Find your footprint (works now, no setup)

One command does everything

Or step by step

The maths

Stack

Data broker supply chain

Attack feasibility assessment

Canary tokens

Profile format

Develop

Origin

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages