LedgerLens - Multimodal Invoice Intelligence + GraphRAG

Reads an invoice image → extracts clean structured data → answers multi-hop supplier questions that vector search can't.

Problem Statement

Finance and procurement teams at mid-market firms manually key 50,000+ invoices/month at ~$3.50/invoice - a $175K/month problem. Worse, they can't answer relationship questions like:

"Which suppliers tied to delayed POs in Q3 also had quality complaints in the past 18 months?"

Pure vector RAG fails on these multi-hop entity-relationship queries because it has no concept of graph structure.

LedgerLens solves both halves:

Claude vision extracts structured data from invoice images at high accuracy with per-field confidence scoring and automatic human-review routing
Neo4j knowledge graph maps Supplier → Invoice → LineItem → PO for relationship reasoning
GraphRAG agent (LangGraph state machine) answers multi-hop questions and returns the full traversal path as an auditable explanation

How It Works

Invoice Image (scan / photo / PDF page)
        │
        ▼
┌─────────────────────────────┐
│   Claude Vision Extraction  │  claude-sonnet-4-6
│   + Pydantic Validation     │  Structured JSON output
│   + Confidence Scoring      │  Per-field 0.0–1.0 scores
└────────────┬────────────────┘
             │
     ┌───────┴────────┐
     ▼                ▼
Auto-approved    Human Review Queue
(conf ≥ 0.75)   (conf < 0.75, low fields flagged)
     │
     ▼
┌─────────────────────────────┐
│  LLM Entity Resolution      │  "Apple Inc" / "Apple Computer" → one node
│  + Neo4j Graph Loader       │  Supplier ↔ Invoice ↔ LineItem ↔ PO
└────────────┬────────────────┘
             │
             ▼
┌─────────────────────────────┐
│  LangGraph GraphRAG Agent   │  States: plan → retrieve → traverse → answer
│  Vector seed (pgvector)     │  Hybrid: semantic + graph traversal
│  + Graph traversal (Neo4j)  │  Returns answer + full path (auditability)
└────────────┬────────────────┘
             │
             ▼
┌─────────────────────────────┐
│  FastAPI + React UI         │  Upload invoice, ask questions
│  Langfuse observability     │  Span-level traces + cost per document
│  DeepEval / RAGAS evals     │  Field accuracy, groundedness, CI-gated
└─────────────────────────────┘

Key Features

Feature	Why It Matters
Multimodal extraction (Claude vision)	Handles scanned/photographed invoices - no OCR pre-processing required
Pydantic structured outputs + confidence routing	Low-confidence fields flagged for human review - the production realism employers look for
LLM entity resolution	Normalises "Apple Inc"/"Apple Computer" → one Neo4j node; the documented silent failure mode of GraphRAG
Neo4j knowledge graph	`Supplier ↔ Invoice ↔ LineItem ↔ PO` enables multi-hop questions vector search can't answer
GraphRAG agent (LangGraph)	Returns traversal path as auditable explanation — required by regulated buyers
DeepEval / RAGAS eval harness	Field-level accuracy + groundedness + context relevance, CI-gated - "the hardest skill to fake"
Full observability (Langfuse)	Span-level tracing: extraction → resolution → retrieval → answer + token cost per document
Cost panel	Per-document LLM cost vs $3.50 manual baseline - signals cost-optimisation discipline
GraphRAG vs vector-only comparison	Side-by-side on multi-hop questions - quantifies why the graph matters

Tech Stack

Extraction Layer:   Claude claude-sonnet-4-6 (vision) · Pydantic v2 · Pillow
Graph Layer:        Neo4j 5.15 Aura · pgvector / Qdrant (hybrid retrieval)
Agent Layer:        LangGraph · LangChain · Anthropic SDK
Eval Layer:         DeepEval · RAGAS · Langfuse · Arize Phoenix
API Layer:          FastAPI · Uvicorn
Frontend:           React · Next.js · TypeScript
Datasets:           CORD v2 (1,000 receipts, CC BY 4.0) · SROIE · FUNSD · DocILE
Infra:              Docker · Fly.io (API) · Neo4j Aura (free tier)

3-Day Build Plan

Day 1 - Extraction Pipeline ✅

Claude vision → Pydantic schema extraction with structured JSON prompt
Per-field confidence scoring (0.0–1.0) with math cross-validation
Human-review routing for low-confidence documents
CORD v2 + SROIE dataset download script
Field-level accuracy evaluation against ground-truth labels
Full pytest test suite

Day 2 - Knowledge Graph + GraphRAG Agent

Neo4j Aura schema: (:Supplier)→[:ISSUED]→(:Invoice)→[:CONTAINS]→(:LineItem)
LLM entity resolution: normalise supplier name variants → single canonical node
LangGraph agent state machine: extract → resolve → load → answer
Hybrid retrieval: pgvector semantic seed + Neo4j graph traversal
Traversal path returned as auditable explanation

Day 3 - Evals, Observability, Deploy

DeepEval/RAGAS harness: field accuracy + groundedness + context relevance
GraphRAG vs vector-only comparison notebook with results table
Langfuse/Phoenix tracing: span-level view + token cost per document
FastAPI backend + minimal React upload UI
Docker + Fly.io deploy (live URL)
Architecture diagram + eval results in README

Skill Coverage

2026 JD Requirement	This Project
Multimodal / vision	✅ Core feature
GraphRAG / knowledge graphs (Neo4j)	✅ Core feature
Eval design / LLM-as-judge	✅ Core feature
LangGraph / stateful agents	✅ Core feature
Observability (Langfuse/Phoenix)	✅ Core feature
RAG	✅ Hybrid GraphRAG
Vector DBs (pgvector/Qdrant)	✅ Seed retrieval
Structured outputs (Pydantic)	✅ Throughout
Python 3.12	✅
FastAPI	✅
Docker	✅
Prompt engineering	✅
CI/CD (eval-gated)	✅
Cloud deploy	✅ Fly.io
Cost optimisation	✅ Per-doc cost panel

Running It

# 1. Clone and install
git clone https://github.com/yourusername/ledgerlens
cd ledgerlens
pip install -r requirements.txt

# 2. Environment
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env

# 3. Download evaluation datasets
python scripts/download_datasets.py

# 4. Run accuracy evaluation (Day 1)
python scripts/run_eval.py

# 5. Extract a single invoice
python -c "
from src.ledgerlens.extraction.pipeline import ExtractionPipeline
result = ExtractionPipeline().extract('path/to/invoice.png')
print(result.model_dump_json(indent=2))
"

Cost Analysis

Volume	LedgerLens	Manual @ $3.50	Savings
1,000 invoices/mo	~$0.75	$3,500	99.98%
10,000 invoices/mo	~$7.50	$35,000	99.98%
50,000 invoices/mo	~$37.50	$175,000	99.98%

At claude-sonnet-4-6 pricing: $3/M input + $15/M output tokens. Avg ~250 input + 300 output tokens/invoice.

"I built a service that reads an invoice image into clean structured data and answers 'which suppliers behind last month's delayed POs also had quality issues' - with the full audit trail - using a knowledge graph instead of brittle vector search."

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
frontend		frontend
scripts		scripts
src/ledgerlens		src/ledgerlens
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LedgerLense Output.mov		LedgerLense Output.mov
README.md		README.md
docker-compose.yml		docker-compose.yml
fly.toml		fly.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LedgerLens - Multimodal Invoice Intelligence + GraphRAG

Problem Statement

How It Works

Key Features

Tech Stack

3-Day Build Plan

Day 1 - Extraction Pipeline ✅

Day 2 - Knowledge Graph + GraphRAG Agent

Day 3 - Evals, Observability, Deploy

Skill Coverage

Running It

Cost Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LedgerLens - Multimodal Invoice Intelligence + GraphRAG

Problem Statement

How It Works

Key Features

Tech Stack

3-Day Build Plan

Day 1 - Extraction Pipeline ✅

Day 2 - Knowledge Graph + GraphRAG Agent

Day 3 - Evals, Observability, Deploy

Skill Coverage

Running It

Cost Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages