feat(feeds): AI Incident Database feed — AI-threat-landscape vertical by rinjanianalytics · Pull Request #157 · rinjanianalytics/cti-platform-api

rinjanianalytics · 2026-06-17T02:23:19Z

What

Builds #2 of the free-source roadmap: the AI Incident Database (incidentdatabase.ai) feed — real-world AI harm/failure incidents, the live "what's actually going wrong with deployed AI" signal that complements the static MITRE ATLAS technique taxonomy.

Key design call: a dedicated `ai_incidents` table, NOT `atlas_case_studies`

ATLAS case studies are ~30 curated incidents mapped to AML techniques; AID is ~1500 raw incidents with no technique mapping. Dumping AID into atlas_case_studies would distort the ATLAS coverage heatmap (which counts case studies per technique) and conflate two distinct sources. AI incidents get their own domain table — mirroring how telco got network_elements/fraud_schemes and on-chain got wallets. (Migration 0067; iocs.type-style no-constraint, new table is IF NOT EXISTS.)

Source path: snapshot, not API

AID's GraphQL endpoint is origin-locked (Forbidden — restricted to web browsers), so the official programmatic/research path is the published MongoDB snapshot — a public R2 bucket of mongodump backups (~94 MB tar.bz2). Each archive carries a clean top-level incidents.csv. The connector stream-extracts only that file: peak memory stays ~CSV-sized (1 MB) even though the tar decompresses to hundreds of MB.

Deps added (worker): unbzip2-stream + tar-stream + csv-parse (+ type shims for the two untyped stream libs, one copy per tsconfig program).

Surface

Upsert on natural key incident_id; derived tags (always ai-incident + alleged developer/deployer slugs) so the AI vertical contributes a movers signal like IOC tags.
GET /v1/ai-incidents (filter: q, since, limit)
GET /v1/ai-incidents/stats → total + monthly timeline + top developers (the "incidents over time" trend)
Registered as aiid; scheduled weekly (Mon 02:15 UTC — AID adds a handful/week, snapshot is large).

Verification (boundary-tested, per the #125/#130 lesson)

Ran the full pipeline end-to-end against a local DB, not just unit tests:

download → bz2 → tar → incidents.csv → parse (1517 rows) → map → real drizzle upsert (the excluded.* set, date column, jsonb arrays) → stats
idempotent: re-upsert held the count at 1517
stats signal is real: top alleged developers deepfake-technology-developers (381), openai (158), google (88)
gateway tsc (strict gate) + dockerfile-deps guard + api tests (15-feed registry) all green

Deploy notes

Has a migration (0067_ai_incidents.sql) — unlike OFAC. Deploy must run db:apply after the image rebuild.
Dashboard surface (an AI-incidents page / overview trend) is the natural follow-up, like the telco/onchain pages.

🤖 Generated with Claude Code

…ical #2 of the free-source roadmap. Ingests real-world AI harm/failure incidents from the AI Incident Database (incidentdatabase.ai) — the live "what's actually going wrong with deployed AI" signal that complements the static MITRE ATLAS technique taxonomy. Dedicated `ai_incidents` table (migration 0067), deliberately NOT atlas_case_studies: ATLAS case studies are ~30 curated incidents mapped to AML techniques; AID is ~1500 raw incidents with no technique mapping — mixing them would distort the ATLAS coverage view. AI incidents are their own domain entity, mirroring telco (network_elements/fraud_schemes) and on-chain (wallets). Source: AID's GraphQL API. It gates non-browser callers ("restricted to web browsers") but allows a same-site `Origin` + a browser `User-Agent` — the data is openly licensed for research, the gate is anti-abuse. The connector (apps/worker/src/feeds/ai-incidents.ts) pages incidents(pagination,sort) — ~8 small requests for the ~1.5k corpus — so it runs daily. The API is richer than the CSV snapshot: alleged-party relations carry both an `entity_id` slug (clean tags) and a human `name` (display). Upsert on natural key incident_id; derived `tags` (always `ai-incident` + developer/deployer slugs) so the AI vertical contributes a movers signal like IOC tags. Read route GET /v1/ai-incidents + /ai-incidents/stats (total + monthly timeline + top developers — the "incidents over time" trend). Registered as `aiid`; scheduled daily (02:15 UTC). Verified end-to-end against the live API + local DB: paged 1525 incidents → map → drizzle upsert (0 failed), names resolve ("OpenAI", "Google DeepMind"); idempotent; gateway tsc + api tests (15-feed registry) green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rinjanianalytics force-pushed the feat/ai-incident-database-feed branch from 22a97d2 to c208d7d Compare June 17, 2026 02:23

rinjanianalytics force-pushed the feat/ai-incident-database-feed branch from c208d7d to 444d99b Compare June 17, 2026 02:37

rinjanianalytics merged commit d6cd931 into master Jun 17, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(feeds): AI Incident Database feed — AI-threat-landscape vertical#157

feat(feeds): AI Incident Database feed — AI-threat-landscape vertical#157
rinjanianalytics merged 1 commit into
masterfrom
feat/ai-incident-database-feed

rinjanianalytics commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rinjanianalytics commented Jun 17, 2026

What

Key design call: a dedicated ai_incidents table, NOT atlas_case_studies

Source path: snapshot, not API

Surface

Verification (boundary-tested, per the #125/#130 lesson)

Deploy notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Key design call: a dedicated `ai_incidents` table, NOT `atlas_case_studies`