Accounts-payable automation platform that ingests vendor PDF invoices, runs a LangChain multi-agent pipeline, and stores structured results in SQLite with PDFs on AWS S3.
Dashboard — KPIs, multi-agent pipeline overview, quick actions, and recent activity.
Upload — Single PDF upload, batch processing, pipeline steps, and sample extraction result.
Invoice registry — Sortable table with confidence bars, validation status, and PDF links.
Metrics — Status breakdown, confidence analysis, and 24-hour activity.
Clear Ledger AP reduces manual invoice handling by automating:
- Extraction - Parse PDFs (text + OCR) and pull vendor, dates, line items, and totals via OpenAI
- Validation - Enforce Pydantic schemas, confidence scoring, and anomaly detection
- PO matching - Fuzzy match against
data/raw/vendor_data.csv - Human review - Queue low-confidence or failed documents for correction in the UI
- RAG fallback - FAISS index over sample fault patterns to classify and recover common errors
Target throughput: on the order of thousands of invoices per month with sub-minute processing per document.
Next.js UI (3000) --> FastAPI API + WebSockets (8000) --> Agent orchestrator
|-> SQLite (invoices.db)
|-> AWS S3 (PDFs)
| Layer | Stack |
|---|---|
| Frontend | Next.js 14, React 18, Tailwind CSS, TanStack Query |
| Backend | FastAPI, Uvicorn, WebSockets |
| Agents | LangChain workflow: extraction, validation, PO match, human review, FAISS fallback |
| Storage | SQLite metadata, S3 object storage for originals |
| Deploy | Docker Compose, Render or Railway (API), Vercel (frontend) |
- Extraction -
gpt-4o-mini, pdfplumber, Tesseract OCR - Validation - Schema checks, confidence thresholds
- PO matching - Fuzzy string match to purchase orders
- Human review - Triggered when confidence < 0.9 or validation fails
- Fallback (RAG) - FAISS + sentence-transformers on
data/test_samples/
clear_ledger_ap/
api/ FastAPI routes and WebSocket progress
agents/ LangChain agent implementations
workflows/ Orchestrator
data_processing/ OCR, RAG, PO matcher, scoring
data/raw/invoices/ Sample PDF batch (~35 files)
data/test_samples/ Faulty PDFs for RAG training
frontend-nextjs/ Dashboard (upload, invoices, review, metrics, anomalies)
invoices.db SQLite (created at runtime)
docker-compose.yml
Prerequisites: Docker, OpenAI API key, AWS credentials and S3 bucket.
- Clone and enter the repo:
git clone https://github.com/chris9753/clear_ledger_nextjs.git
cd clear_ledger_nextjs- Create
.envin the project root:
OPENAI_API_KEY=your_key
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
BUCKET_NAME=your_bucket_name
S3_BUCKET_NAME=your_bucket_name
AWS_DEFAULT_REGION=us-east-1- Start services:
docker compose up --build -d- Open the app:
- Frontend: http://localhost:3000
- API docs: http://localhost:8000/docs
docker pull chris9753/clear_ledger_nextjs_backend:latest
docker pull chris9753/clear_ledger_nextjs_frontend:latestBackend (from repo root):
python -m venv .venv
# Windows: .\.venv\Scripts\Activate.ps1
pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
uvicorn api.app:app --reload --port 8000Frontend:
cd frontend-nextjs
npm install
npm run devSet NEXT_PUBLIC_USE_MOCK_DATA=true in frontend-nextjs/next.config.js to run the UI without a backend (sample data only).
Apply a policy similar to aws_policy.json in the repo. Disable "Block all public access" only if you intend to serve PDFs via direct URLs.
If upgrading from JSON-backed storage:
python migrate_json_to_db.py --json-path data/processed/structured_invoices.json
sqlite3 invoices.db "SELECT COUNT(*) FROM invoice_metadata;"| Route | Purpose |
|---|---|
/ |
Dashboard and pipeline overview |
/upload |
Single upload and batch processing |
/invoices |
Searchable invoice registry |
/review |
Human-in-the-loop corrections |
/metrics |
Throughput and confidence analytics |
/anomalies |
Error and low-confidence log |
GitHub Actions deploys the frontend to Vercel on push to main (see .github/workflows/frontend-deploy.yml).
Required GitHub secrets:
| Secret | Description |
|---|---|
VERCEL_TOKEN |
Vercel account token |
VERCEL_ORG_ID |
Team/user ID from Vercel project settings |
VERCEL_PROJECT_ID |
Project ID from Vercel project settings |
Vercel project setup (required):
- Import the repo in Vercel.
- Settings → General → Root Directory → set to
frontend-nextjs→ Save.
If this stays at the repo root, Vercel seesapi/app.py(FastAPI) and builds fail.
Important: With Root Directory set tofrontend-nextjs, the GitHub workflow must runvercelcommands from the repository root, notcd frontend-nextjs— otherwise Vercel looks forfrontend-nextjs/frontend-nextjsand errors. - Add
NEXT_PUBLIC_MAIN_API_URLunder Environment Variables (Production) pointing at your deployed backend API.
CI workflow: Runs vercel pull, vercel build, and vercel deploy --prebuilt from the repo root. Vercel applies the Root Directory setting automatically.
You can also connect the repo directly in Vercel for automatic deploys; the GitHub Action is useful if you want deploys gated on CI or triggered manually.
Deploy the API to Railway or Render. Deploy the UI to Vercel. You do not need Docker installed on your machine for this flow.
| Platform | What it runs | Root directory |
|---|---|---|
| Railway | FastAPI backend | Repo root (empty) |
| Vercel | Next.js frontend | frontend-nextjs |
- Push code to GitHub.
- Railway connects to the repo and builds in the cloud (see
railway.toml— usesbackend/Dockerfileon Railway’s builders; you never rundocker composelocally). - Vercel connects to the same repo with root directory
frontend-nextjs.
After code fixes, push to main and let Railway redeploy (or click Redeploy in the Railway dashboard). No local Docker build is required.
- railway.app → New Project → Deploy from GitHub repo → select this repository.
- Root directory: leave empty (repo root). Do not set
frontend-nextjs. - Railway reads
railway.toml(cloud build frombackend/Dockerfile, health check/health). In the service Settings → Build, builder should be Dockerfile (not a custom Nixpacks-only setup unless you maintain that yourself). - Settings → Networking → Generate Domain and copy the public URL.
- Variables (backend service → Variables — required before the app will stay up):
| Variable | Example / notes |
|---|---|
OPENAI_API_KEY |
Required. Your OpenAI API key (sk-...). Without it the container crashes on startup. |
AWS_ACCESS_KEY_ID |
Secret |
AWS_SECRET_ACCESS_KEY |
Secret |
S3_BUCKET_NAME |
Your bucket |
AWS_DEFAULT_REGION |
us-east-1 |
CORS_ORIGINS |
https://clear-ledger-ap.vercel.app (comma-separate multiple origins) |
Optional persistence (recommended):
- Service → Volumes → Add Volume → mount path
/app/data - Either rely on auto paths (
RAILWAY_VOLUME_MOUNT_PATH→invoices.dbanddata/under the volume), or set explicitly:DATABASE_PATH=/app/data/invoices.dbDATA_DIR=/app/data/data
Railway sets PORT automatically; backend/start.sh binds to it.
Start command: leave empty (uses Docker CMD → backend/start.sh).
Do not override with a custom start command unless you use the same uvicorn line.
API docs: https://<your-service>.up.railway.app/docs
Use the included render.yaml blueprint or create the service manually with the same settings.
- Push this repo to GitHub (if not already).
- In Render: New → Blueprint → connect the repo → apply
render.yaml. - When prompted, set secret environment variables:
OPENAI_API_KEYAWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYS3_BUCKET_NAME(your S3 bucket name)CORS_ORIGINS— your Vercel frontend URL(s), comma-separated, e.g.https://your-app.vercel.app
- After deploy, copy the service URL (e.g.
https://clearledger-api.onrender.com).
| Setting | Value |
|---|---|
| Runtime | Docker |
| Dockerfile path | backend/Dockerfile |
| Docker context | Repository root (.) |
| Health check path | /health |
| Disk (paid plans) | Mount /var/data, 1 GB |
Environment variables:
| Variable | Example / notes |
|---|---|
DATABASE_PATH |
/var/data/invoices.db (with persistent disk) |
DATA_DIR |
/var/data/data |
AWS_DEFAULT_REGION |
us-east-1 |
OPENAI_API_KEY |
Secret |
AWS_ACCESS_KEY_ID |
Secret |
AWS_SECRET_ACCESS_KEY |
Secret |
S3_BUCKET_NAME |
Your bucket |
CORS_ORIGINS |
https://clear-ledger-ap.vercel.app (comma-separate multiple origins) |
Render injects PORT automatically; the container entrypoint reads it via backend/start.sh.
In Vercel → Project → Environment Variables (Production):
NEXT_PUBLIC_MAIN_API_URL= your backend URL (no trailing slash), e.g.https://clearledger-ap-production.up.railway.app
Redeploy the frontend after changing this variable (or rely on the default in frontend-nextjs/next.config.js after a new build).
- Persistent storage: use a Render disk (
/var/data) or a Railway volume (/app/data). Without it, SQLite resets on redeploy. - Cold starts on free/low tiers can take 30–60+ seconds after idle.
- WebSockets (batch upload progress) work over
wss://when the API is served over HTTPS.
MIT - see LICENSE.



