-
Notifications
You must be signed in to change notification settings - Fork 0
Breach Scanning
The scan engine checks every employee email against every configured
breach-intelligence provider, persists matches, and fires notifications. It
lives in src/lib/scan/.
A provider implements one contract (src/lib/scan/types.ts):
interface BreachProvider {
id: ApiProvider // HIBP | DEHASHED | LEAKCHECK | INTELX | SNUSBASE
source: BreachSource // HIBP | MANUAL | DARK_WEB
lookup(email: string, apiKey: string): Promise<Finding[]>
}A Finding is the normalized result shared across providers:
interface Finding {
name: string // breach identifier, used as the Breach unique key
breachDate: Date // epoch (1970) when the provider does not expose it
dataTypes: string[] // exposed data types, normalized to snake_case
}Wired providers are registered in src/lib/scan/registry.ts:
| Provider |
ApiProvider id |
Source |
|---|---|---|
| Have I Been Pwned | HIBP |
HIBP |
| LeakCheck | LEAKCHECK |
dark web |
| DeHashed | DEHASHED |
dark web |
| Intelligence X | INTELX |
dark web |
| Snusbase | SNUSBASE |
dark web |
Adding a source = implement
BreachProvider, add it to thePROVIDERSarray in the registry, and add a value to theApiProviderenum.
runScan(companyId, providers) in src/lib/scan/runner.ts:
- Loads all employees for the company with their existing
breachRecords. - Resolves alert recipients (company admins, only if email is enabled) and active webhooks once, up front.
- For each employee, for each active provider, calls
lookup(). Provider errors are isolated (caught and skipped) so one failing provider never aborts the scan. - Each new finding is persisted by
persistFinding: upsert theBreach, skip if the employee is already linked, otherwise create theBreachRecord+Alert, then send email and dispatch webhooks. - Sleeps
RATE_LIMIT_MS(1500 ms) between employees to stay within provider rate limits.
Returns { scanned, newRecords, newAlerts }.
Active providers are loaded by loadActiveProviders, which decrypts each
stored API key server-side and stamps lastUsedAt.
Severity is derived from exposed data types (severityFor in the runner).
The critical set is:
password, hashed_password, credit_card, ssn, bank_account
| Critical types in the finding | Severity |
|---|---|
| 2 or more | CRITICAL |
| exactly 1 | HIGH |
| 0 | MEDIUM |
POST /api/employees/scan (any authenticated user). The route enforces three
guards:
-
Rate limit: 5 scans per company per minute, else
429. -
No provider configured:
503with a prompt to add a key in Data API. -
Concurrency: one running scan per company at a time, else
409.
See API Reference for the full endpoint list and Configuration for where keys come from.
DataShield is source-available software by Melvin PETIT (WhiteMuush). Work in progress, not production ready.
Getting started
Architecture
Features
Reference
Contributing