Breach Scanning

The scan engine checks every employee email against every configured breach-intelligence provider, persists matches, and fires notifications. It lives in src/lib/scan/.

Providers

A provider implements one contract (src/lib/scan/types.ts):

interface BreachProvider {
  id: ApiProvider          // HIBP | DEHASHED | LEAKCHECK | INTELX | SNUSBASE
  source: BreachSource     // HIBP | MANUAL | DARK_WEB
  lookup(email: string, apiKey: string): Promise<Finding[]>
}

A Finding is the normalized result shared across providers:

interface Finding {
  name: string        // breach identifier, used as the Breach unique key
  breachDate: Date    // epoch (1970) when the provider does not expose it
  dataTypes: string[] // exposed data types, normalized to snake_case
}

Wired providers are registered in src/lib/scan/registry.ts:

Provider	`ApiProvider` id	Source
Have I Been Pwned	`HIBP`	`HIBP`
LeakCheck	`LEAKCHECK`	dark web
DeHashed	`DEHASHED`	dark web
Intelligence X	`INTELX`	dark web
Snusbase	`SNUSBASE`	dark web

Adding a source = implement BreachProvider, add it to the PROVIDERS array in the registry, and add a value to the ApiProvider enum.

How a scan runs

runScan(companyId, providers) in src/lib/scan/runner.ts:

Loads all employees for the company with their existing breachRecords.
Resolves alert recipients (company admins, only if email is enabled) and active webhooks once, up front.
For each employee, for each active provider, calls lookup(). Provider errors are isolated (caught and skipped) so one failing provider never aborts the scan.
Each new finding is persisted by persistFinding: upsert the Breach, skip if the employee is already linked, otherwise create the BreachRecord + Alert, then send email and dispatch webhooks.
Sleeps RATE_LIMIT_MS (1500 ms) between employees to stay within provider rate limits.

Returns { scanned, newRecords, newAlerts }.

Active providers are loaded by loadActiveProviders, which decrypts each stored API key server-side and stamps lastUsedAt.

Severity scoring

Severity is derived from exposed data types (severityFor in the runner). The critical set is:

password, hashed_password, credit_card, ssn, bank_account

Critical types in the finding	Severity
2 or more	`CRITICAL`
exactly 1	`HIGH`
0	`MEDIUM`

Triggering a scan

POST /api/employees/scan (any authenticated user). The route enforces three guards:

Rate limit: 5 scans per company per minute, else 429.
No provider configured: 503 with a prompt to add a key in Data API.
Concurrency: one running scan per company at a time, else 409.

See API Reference for the full endpoint list and Configuration for where keys come from.

DataShield is source-available software by Melvin PETIT (WhiteMuush). Work in progress, not production ready.

DataShield

Home

Getting started

Architecture

Features

Reference

Contributing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breach Scanning

Breach Scanning

Providers

How a scan runs

Severity scoring

Triggering a scan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DataShield

Clone this wiki locally