Skip to content

Entity extraction and graph visualization #14

@monneyboi

Description

@monneyboi

Summary

Add entity extraction from documents with relationship tracking and graph visualization. This would enable journalists to automatically discover people, organizations, locations, and money flows across document collections.

Design

Entity Extraction Pipeline

  1. Chunk document text during ingestion
  2. LLM extracts entities + relationships per chunk (structured output)
  3. Merge/dedupe entities across chunks
  4. Resolve against existing entities in database
  5. Store entities, mentions, and relationships

Storage

SQLite database for entities with:

  • entities - canonical entities with aliases and attributes
  • mentions - links entities to documents with context snippets
  • relationships - typed connections between entities (employed_by, paid_by, etc.)
  • FTS5 virtual table for entity name search

Rust Types

  • Entity, EntityId, EntityType - core entity model
  • Mention - entity occurrence in a document
  • Relationship - typed edge between entities
  • ExtractionResult - LLM structured output format
  • EntityGraph - nodes/edges for visualization

UI Components

  • EntityExplorer - main container with list/graph toggle
  • EntityList - searchable/filterable sidebar
  • EntityGraph - force-directed canvas visualization with zoom/pan
  • EntityDetail - tabs for mentions, relationships, attributes

Key Decisions

  • Entities are derived data (like embeddings) - not synced between peers
  • SQLite for relational queries (graph traversal, co-occurrences)
  • ULID for entity IDs (sortable, no coordination needed)
  • Simple force simulation in canvas rather than D3 dependency
  • Entity resolution: exact match → alias match → fuzzy match → create new

Agent Integration

New tool for the research agent:

find_entity_connections(entity_name, max_hops) -> EntityGraph

Enables queries like: "What's the connection between Senator X and Company Y?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions