Reads an invoice image → extracts clean structured data → answers multi-hop supplier questions that vector search can't.
Finance and procurement teams at mid-market firms manually key 50,000+ invoices/month at ~$3.50/invoice - a $175K/month problem. Worse, they can't answer relationship questions like:
"Which suppliers tied to delayed POs in Q3 also had quality complaints in the past 18 months?"
Pure vector RAG fails on these multi-hop entity-relationship queries because it has no concept of graph structure.
LedgerLens solves both halves:
- Claude vision extracts structured data from invoice images at high accuracy with per-field confidence scoring and automatic human-review routing
- Neo4j knowledge graph maps
Supplier → Invoice → LineItem → POfor relationship reasoning - GraphRAG agent (LangGraph state machine) answers multi-hop questions and returns the full traversal path as an auditable explanation
Invoice Image (scan / photo / PDF page)
│
▼
┌─────────────────────────────┐
│ Claude Vision Extraction │ claude-sonnet-4-6
│ + Pydantic Validation │ Structured JSON output
│ + Confidence Scoring │ Per-field 0.0–1.0 scores
└────────────┬────────────────┘
│
┌───────┴────────┐
▼ ▼
Auto-approved Human Review Queue
(conf ≥ 0.75) (conf < 0.75, low fields flagged)
│
▼
┌─────────────────────────────┐
│ LLM Entity Resolution │ "Apple Inc" / "Apple Computer" → one node
│ + Neo4j Graph Loader │ Supplier ↔ Invoice ↔ LineItem ↔ PO
└────────────┬────────────────┘
│
▼
┌─────────────────────────────┐
│ LangGraph GraphRAG Agent │ States: plan → retrieve → traverse → answer
│ Vector seed (pgvector) │ Hybrid: semantic + graph traversal
│ + Graph traversal (Neo4j) │ Returns answer + full path (auditability)
└────────────┬────────────────┘
│
▼
┌─────────────────────────────┐
│ FastAPI + React UI │ Upload invoice, ask questions
│ Langfuse observability │ Span-level traces + cost per document
│ DeepEval / RAGAS evals │ Field accuracy, groundedness, CI-gated
└─────────────────────────────┘
| Feature | Why It Matters |
|---|---|
| Multimodal extraction (Claude vision) | Handles scanned/photographed invoices - no OCR pre-processing required |
| Pydantic structured outputs + confidence routing | Low-confidence fields flagged for human review - the production realism employers look for |
| LLM entity resolution | Normalises "Apple Inc"/"Apple Computer" → one Neo4j node; the documented silent failure mode of GraphRAG |
| Neo4j knowledge graph | Supplier ↔ Invoice ↔ LineItem ↔ PO enables multi-hop questions vector search can't answer |
| GraphRAG agent (LangGraph) | Returns traversal path as auditable explanation — required by regulated buyers |
| DeepEval / RAGAS eval harness | Field-level accuracy + groundedness + context relevance, CI-gated - "the hardest skill to fake" |
| Full observability (Langfuse) | Span-level tracing: extraction → resolution → retrieval → answer + token cost per document |
| Cost panel | Per-document LLM cost vs $3.50 manual baseline - signals cost-optimisation discipline |
| GraphRAG vs vector-only comparison | Side-by-side on multi-hop questions - quantifies why the graph matters |
Extraction Layer: Claude claude-sonnet-4-6 (vision) · Pydantic v2 · Pillow
Graph Layer: Neo4j 5.15 Aura · pgvector / Qdrant (hybrid retrieval)
Agent Layer: LangGraph · LangChain · Anthropic SDK
Eval Layer: DeepEval · RAGAS · Langfuse · Arize Phoenix
API Layer: FastAPI · Uvicorn
Frontend: React · Next.js · TypeScript
Datasets: CORD v2 (1,000 receipts, CC BY 4.0) · SROIE · FUNSD · DocILE
Infra: Docker · Fly.io (API) · Neo4j Aura (free tier)
- Claude vision → Pydantic schema extraction with structured JSON prompt
- Per-field confidence scoring (0.0–1.0) with math cross-validation
- Human-review routing for low-confidence documents
- CORD v2 + SROIE dataset download script
- Field-level accuracy evaluation against ground-truth labels
- Full pytest test suite
- Neo4j Aura schema:
(:Supplier)→[:ISSUED]→(:Invoice)→[:CONTAINS]→(:LineItem) - LLM entity resolution: normalise supplier name variants → single canonical node
- LangGraph agent state machine:
extract → resolve → load → answer - Hybrid retrieval: pgvector semantic seed + Neo4j graph traversal
- Traversal path returned as auditable explanation
- DeepEval/RAGAS harness: field accuracy + groundedness + context relevance
- GraphRAG vs vector-only comparison notebook with results table
- Langfuse/Phoenix tracing: span-level view + token cost per document
- FastAPI backend + minimal React upload UI
- Docker + Fly.io deploy (live URL)
- Architecture diagram + eval results in README
| 2026 JD Requirement | This Project |
|---|---|
| Multimodal / vision | ✅ Core feature |
| GraphRAG / knowledge graphs (Neo4j) | ✅ Core feature |
| Eval design / LLM-as-judge | ✅ Core feature |
| LangGraph / stateful agents | ✅ Core feature |
| Observability (Langfuse/Phoenix) | ✅ Core feature |
| RAG | ✅ Hybrid GraphRAG |
| Vector DBs (pgvector/Qdrant) | ✅ Seed retrieval |
| Structured outputs (Pydantic) | ✅ Throughout |
| Python 3.12 | ✅ |
| FastAPI | ✅ |
| Docker | ✅ |
| Prompt engineering | ✅ |
| CI/CD (eval-gated) | ✅ |
| Cloud deploy | ✅ Fly.io |
| Cost optimisation | ✅ Per-doc cost panel |
# 1. Clone and install
git clone https://github.com/yourusername/ledgerlens
cd ledgerlens
pip install -r requirements.txt
# 2. Environment
cp .env.example .env
# Add your ANTHROPIC_API_KEY to .env
# 3. Download evaluation datasets
python scripts/download_datasets.py
# 4. Run accuracy evaluation (Day 1)
python scripts/run_eval.py
# 5. Extract a single invoice
python -c "
from src.ledgerlens.extraction.pipeline import ExtractionPipeline
result = ExtractionPipeline().extract('path/to/invoice.png')
print(result.model_dump_json(indent=2))
"| Volume | LedgerLens | Manual @ $3.50 | Savings |
|---|---|---|---|
| 1,000 invoices/mo | ~$0.75 | $3,500 | 99.98% |
| 10,000 invoices/mo | ~$7.50 | $35,000 | 99.98% |
| 50,000 invoices/mo | ~$37.50 | $175,000 | 99.98% |
At claude-sonnet-4-6 pricing: $3/M input + $15/M output tokens. Avg ~250 input + 300 output tokens/invoice.
"I built a service that reads an invoice image into clean structured data and answers 'which suppliers behind last month's delayed POs also had quality issues' - with the full audit trail - using a knowledge graph instead of brittle vector search."