Summary
Add entity extraction from documents with relationship tracking and graph visualization. This would enable journalists to automatically discover people, organizations, locations, and money flows across document collections.
Design
Entity Extraction Pipeline
- Chunk document text during ingestion
- LLM extracts entities + relationships per chunk (structured output)
- Merge/dedupe entities across chunks
- Resolve against existing entities in database
- Store entities, mentions, and relationships
Storage
SQLite database for entities with:
entities - canonical entities with aliases and attributes
mentions - links entities to documents with context snippets
relationships - typed connections between entities (employed_by, paid_by, etc.)
- FTS5 virtual table for entity name search
Rust Types
Entity, EntityId, EntityType - core entity model
Mention - entity occurrence in a document
Relationship - typed edge between entities
ExtractionResult - LLM structured output format
EntityGraph - nodes/edges for visualization
UI Components
EntityExplorer - main container with list/graph toggle
EntityList - searchable/filterable sidebar
EntityGraph - force-directed canvas visualization with zoom/pan
EntityDetail - tabs for mentions, relationships, attributes
Key Decisions
- Entities are derived data (like embeddings) - not synced between peers
- SQLite for relational queries (graph traversal, co-occurrences)
- ULID for entity IDs (sortable, no coordination needed)
- Simple force simulation in canvas rather than D3 dependency
- Entity resolution: exact match → alias match → fuzzy match → create new
Agent Integration
New tool for the research agent:
find_entity_connections(entity_name, max_hops) -> EntityGraph
Enables queries like: "What's the connection between Senator X and Company Y?"
Summary
Add entity extraction from documents with relationship tracking and graph visualization. This would enable journalists to automatically discover people, organizations, locations, and money flows across document collections.
Design
Entity Extraction Pipeline
Storage
SQLite database for entities with:
entities- canonical entities with aliases and attributesmentions- links entities to documents with context snippetsrelationships- typed connections between entities (employed_by, paid_by, etc.)Rust Types
Entity,EntityId,EntityType- core entity modelMention- entity occurrence in a documentRelationship- typed edge between entitiesExtractionResult- LLM structured output formatEntityGraph- nodes/edges for visualizationUI Components
EntityExplorer- main container with list/graph toggleEntityList- searchable/filterable sidebarEntityGraph- force-directed canvas visualization with zoom/panEntityDetail- tabs for mentions, relationships, attributesKey Decisions
Agent Integration
New tool for the research agent:
Enables queries like: "What's the connection between Senator X and Company Y?"