Local, offline-friendly Retrieval-Augmented Generation over PDFs/Excel/CSV with:
- Hybrid retrieval (BM25 + dense embeddings) + cross-encoder reranker
- Page-level citations and structured JSON/table answers
- Confidence gating + "unknown" path on low evidence
- Evaluation harness for exact-match/citation checks
- CLI-first; optional REST API to support a future desktop app
Stack: OpenSearch (BM25 + k-NN vectors), sentence-transformers (BGE embeddings + reranker), FastAPI (optional API), Ollama (local LLM).
- Start infra (OpenSearch + Ollama) with Docker (see below).
rag create-index- Put files into
input_docs/thenrag ingest input_docs/ - Ask a question:
rag query "Show Q1 revenue by product; include citations." --format table - Evaluate: add questions to
eval/questions.yamlthenrag eval eval/questions.yaml
Focus on accuracy and determinism. Same core powers a desktop UI later (Electron/Tauri calling the local REST API).
MIT
docker compose up -dWindows PowerShell:
Invoke-WebRequest -Uri "http://localhost:11434/api/pull" -Method POST -Body '{"name":"llama3.2:1b"}' -ContentType "application/json"Linux/macOS/bash:
curl http://localhost:11434/api/pull -d '{"name":"llama3.2:1b"}'python -m pip install -U pip
pip install -e .
rag init
rag create-index# put docs into input_docs/ first
rag ingest input_docs/
rag query "Show Q1 revenue by product; include citations." --format table# edit eval/questions.yaml to your dataset
rag eval eval/questions.yamlrag serve # then POST to http://127.0.0.1:8000/query