Live demo: not deployed yet. Run locally — see
START_HERE.mdfor a single AI-agent prompt that boots everything end-to-end, or follow the manual steps below.
If you have Cursor, Claude Code, Antigravity, GitHub Copilot Chat, Codex CLI or any similar AI IDE, you do not have to read the rest of this README. Open START_HERE.md, copy the entire prompt block, and paste it into the agent. It will:
- Detect your OS / shell (PowerShell, bash or zsh).
- Check Python 3.11+, Node 20+, Docker Desktop, optional Tesseract OCR.
- Copy
.env.exampleto.envand ask once for your API keys. - Bring up Postgres + Qdrant via
docker-compose. - Install Python (
pip install -e ".[dev]") and Node deps. - Initialise the relational schema and the Qdrant collection.
- Start FastAPI on
:8000and Next.js on:3000. - Health-check every UI route and the
/healthendpoint. - Optionally seed the demo corpus.
That is the fastest path from git clone to a working browser tab.
- Universal document ingestion through a single format router (
app/ingestion/router.py) — PDF, DOCX, XLSX/XLS/CSV, PPTX, TXT/Markdown/HTML/JSON, and images (PNG/JPG/WEBP/TIFF/BMP). New formats are a one-line change in the extension map. - PDF nuance preserved — auto-detects
text_pdf,image_pdf, andmixed_pdf, runs Tesseract OCR for image pages, with an optional Vision-LLM fallback for low-confidence pages. - Heading-aware chunking that keeps headings glued to their first paragraph and preserves table/list blocks.
- True hybrid retrieval on Qdrant: dense embeddings + sparse BM25 with server-side RRF fusion, and a persisted BM25 corpus for consistent ingest/query encoding.
- Optional query rewriting (LLM-driven, feature-flagged) triggered only when initial retrieval scores are weak.
- Cohere reranking with a local lexical fallback when the API is unavailable.
- MMR diversification + per-document cap applied after rerank.
- Calibrated confidence combining reranker score, score margin, answer↔context overlap, and refusal detection.
- Unified evidence and confidence gates (
app/query/gates.py) shared by/queryand/chat— prefer "I don't know" over hallucinations. - Optional grounded citations in API responses (
include_citations: true). - Provider failover for completion and embeddings (Groq / OpenAI / OpenRouter, plus Cohere for embeddings) with bounded retries on streaming connections.
- Multi-turn chat with session storage, short-term history and branching (fork from any prior turn into a new session) on top of the same retrieval pipeline.
- Document versioning — re-uploading a file with the same original name increments
documents.versioninstead of creating an orphaned duplicate. The pipeline keys versioning off the user-facing filename, not the temp upload path. - PII redaction at ingest — opt-in regex (or presidio) pass that scrubs emails, phone numbers, IPs, SSN/PAN/Aadhaar, credit-card-style sequences, URLs, and DOBs before embedding so vectors, BM25, citations, and source previews never see the raw value.
- Sliding-window rate limiting on
/queryand/chat— independent per-IP and per-cookie windows withRetry-AfterandX-RateLimit-Scopeheaders plus a structureddetailJSON the UI translates into a toast. - Outbound webhooks for
ingestion.complete,query.completed, andquery.refused, with HMAC signing, an admin CRUD surface, and a test endpoint that delivers regardless of the subscription'senabledflag. - Streaming APIs via Server-Sent Events:
/query/streamand/chat/streamemitdeltaevents followed by afinalevent (with optional citations). - Rich visual answers — Markdown answers render Mermaid diagrams, KaTeX math, syntax-highlighted code (Shiki), Recharts charts and (when the active visual model supports it) generated images. Visual richness is detected automatically from the model name pattern; the user just asks a question.
- Premium Next.js UI — auth shell, sidebar/topbar app shell, query + chat surfaces, document library (clean filenames + copyable short document IDs + even-height cards) and a live status page. Single-token typography (15/14/13/12/11px tiers), accessible Radix primitives, light + dark themes, mobile-first.
- Auto-summary on ingest — one LLM call produces a 2-3 sentence abstract stored in
documents.summary; surfaced in the inspect sheet and search. - Operator surfaces —
/healthreports uptime + DB pool + vector index size + queue depth;/metrics/recentand/logs/recentpower the live status page;/preferencesand/settings/schemaback the in-app settings panel. - Lightweight in-place migrations — startup runs an idempotent migration that adds new columns (
documents.summary/tags/version,chat_sessions.title/parent_session_id/parent_turn_id) when older databases are encountered, so deploys never require a manual ALTER. - Admin CLI —
python -m app.cli {ingest,query,clear-cache,export-corpus,eval}covers ops without a browser. - Typed SDKs —
python -m scripts.gen_sdkregenerates a Python (app/sdk/python) and TypeScript (helpdesk-ui/src/lib/sdk) client from the live OpenAPI schema. - Generic refusal on errors — pipelines never leak raw exceptions to clients.
| Layer | Technology |
|---|---|
| Backend | Python 3.11+, FastAPI, Pydantic v2, structlog |
| Document ingestion | PyMuPDF, python-docx, python-pptx, openpyxl/xlrd, pandas, beautifulsoup4, markdown, chardet, Pillow, pdf2image, pytesseract |
| Text Chunking | langchain-text-splitters with heading-aware splits |
| Embeddings | OpenAI / OpenRouter / Cohere (configurable, with cross-provider failover) |
| Vector DB | Qdrant (named dense + sparse vectors, RRF fusion) or Milvus (dense-only) |
| Relational DB | PostgreSQL or MySQL |
| Reranking | Cohere Rerank with local lexical fallback |
| Frontend | Next.js 14 (App Router), Tailwind CSS, Radix UI, Framer Motion |
| Answer rendering | react-markdown + remark-gfm + remark-math, rehype-katex, Shiki, Mermaid, Recharts |
| Tooling | ruff, black, mypy, pytest, eslint, next lint |
The full reference manual lives under docs/:
docs/reference/MANUAL.md— implementation-level behaviour, accuracy/grounding strategy, and API contracts.docs/reference/sampleqna.md— sample Q&A content used for testing.docs/diagram-v4.html— interactive end-to-end architecture diagram.
Ingestion: Document → Format router → Parse (+ OCR for image pages) →
Heading-aware chunks → Dense + Sparse vectors →
Persist BM25 corpus → Upsert Qdrant (named dense + sparse) +
Relational DB (Postgres/MySQL)
Query / Chat: Question → Router (category + intent) → Optional rewrite →
Hybrid search (dense + sparse, server-side RRF) → Cohere rerank
with lexical fallback → MMR + per-doc cap → Evidence gate →
Generator (category-aware system prompt) → Calibrated confidence
gate → Response (+ optional citations / visuals / SSE deltas)
- Ingestion dispatches each file through the format router, parses it (with OCR when needed for PDFs and images), chunks text with heading-aware boundaries, builds dense + sparse vectors, persists BM25 corpus statistics for consistent query-time encoding, upserts named-vector points into Qdrant, and stores document metadata in the relational store.
- Query routes the request, optionally rewrites the question when retrieval scores are weak, fetches candidates with hybrid search (server-side RRF), reranks them, diversifies with MMR + per-doc cap, then generates a grounded answer that is gated by both an evidence check and a calibrated confidence threshold.
- Chat layers session history on top of the same retrieval pipeline and shares the same evidence/confidence gates.
For a deeper, implementation-level explanation see docs/reference/MANUAL.md.
Prefer the AI-agent path? Use
START_HERE.mdand skip the rest of this section.
git clone https://github.com/Jayesh12356/RAG_Engine.git
cd RAG_Engine
cp .env.example .envFill in at least:
OPENROUTER_API_KEY(orGROQ_API_KEY/OPENAI_API_KEY)OPENAI_API_KEY(used for embeddings by default)COHERE_API_KEY(for reranking — optional, lexical fallback works without it)- DB URLs, or keep defaults for local Docker
If you want OCR for scanned/image documents, install Tesseract locally and configure these env vars (see .env.example):
OCR_ENABLED,OCR_MODE(tesseract|vision|hybrid),OCR_LANGUAGES,OCR_RENDER_DPIOCR_TEXT_CONFIDENCE_THRESHOLD,OCR_VISION_FALLBACK_ENABLEDTESSERACT_CMD(Windows full path if needed)
make up # Qdrant/Milvus + Postgres/MySQL via docker-composemake install # backend (Python) + frontend (Node) deps
make init # creates relational tables and the vector collectionmake dev-backend # FastAPI with autoreload
make dev-frontend # Next.js dev serverOpen http://localhost:3000.
Use the UI to upload files, or call POST /ingest directly. Sample assets live under data/sample_pdfs/.
Migrating to Qdrant hybrid: if you previously ran the dense-only collection, run
python scripts/migrate_qdrant_hybrid.py --confirmand re-ingest your documents. The script recreates the Qdrant collection with named dense + sparse vectors required for server-side RRF.
- Keep frontend deployment connected to the
helpdesk-uisource repository. - Set
NEXT_PUBLIC_API_URL=https://<your-render-backend>.onrender.com. - Do not leave this env empty, otherwise the frontend falls back to
localhost.
Use this as the Render start command:
python scripts/bootstrap_start.pyWhat this command does on each deploy:
- Initialises the relational schema idempotently (
create_all). - Ensures the vector collection exists in Qdrant (with named dense + sparse vectors).
- Starts FastAPI/Uvicorn.
RELATIONAL_DB=postgres
DATABASE_URL=postgres://<user>:<password>@<host>:5432/<db>
DB_SCHEMA=helpdesk_chatbot
VECTOR_DB=qdrant
QDRANT_URL=https://<cluster-id>.<region>.aws.cloud.qdrant.io:6333
QDRANT_API_KEY=<qdrant-cloud-api-key>
QDRANT_COLLECTION=helpdesk_chunks
CORS_ALLOW_ORIGINS=https://<your-vercel-app>.vercel.app,http://localhost:3000Notes:
DATABASE_URLis preferred in production;postgres://is auto-normalised to async SQLAlchemy format.DB_SCHEMAisolates tables when sharing one Postgres instance; use a unique schema per project.- A Qdrant API key is optional locally but required for most Qdrant Cloud projects.
- Seed data ingestion is intentionally skipped in production startup (only schema + collection bootstrap).
- Health
GET /health— returns provider, vector_db, relational_db, demo_mode,visual_capable,image_gen_active, plus operational counters:uptime_seconds,db_pool(size / checked_in / checked_out / overflow),vector_index_size,queue_depth.
- Ingestion & documents
POST /ingest— accepts every supported document type (seeapp/ingestion/router.py); SSE progress is delivered viatask_id.GET /documents/GET /documents/{document_id}/chunks/DELETE /documents/{document_id}.GET /pdfs/{pdf_name}andGET /pdfs/by-id/{document_id}for source links.GET /tagsandPOST /documents/{document_id}/tagsfor tag/space management.
- Query
POST /query— standard request/response.POST /query/stream— SSE streaming (delta+final).
- Chat
POST /chat/POST /chat/stream.GET /chat/sessions/GET /chat/{session_id}/history/DELETE /chat/{session_id}.POST /chat/sessions/{session_id}/branch— fork the session from a specific turn into a new one (parent_session_id+parent_turn_idare persisted on the child).
- Operator
GET /metrics/recent/GET /logs/recent— power the live status page.GET /preferences/PUT /preferences— per-cookie UI preferences.GET /settings/schema— declarative form schema for the in-app settings panel.GET /webhooks/POST /webhooks/PATCH /webhooks/{id}/DELETE /webhooks/{id}/POST /webhooks/{id}/test— subscription CRUD plus a test-delivery endpoint that bypasses theenabledflag.
Both /query and /chat accept an optional include_citations: true to receive a structured citations list pointing back to the retrieved chunks (chunk id, document id, source name, page number, section title, score).
/query and /chat are protected by an in-process sliding-window rate limiter (per-IP and per-cookie). Limits are tuned via RATE_LIMIT_PER_IP_PER_MIN (default 60) and RATE_LIMIT_PER_COOKIE_PER_MIN (default 600). On rejection the response carries Retry-After, X-RateLimit-Scope, and a JSON detail of the form {"message", "scope", "retry_after_seconds"} that the frontend reads to render a toast.
For full payload shapes and behaviour, see docs/reference/MANUAL.md.
All behaviour is driven by environment variables loaded into app/config.py. The full surface lives in .env.example; this section enumerates the knobs that ship with the upgraded pipeline.
| Variable | Default | Purpose |
|---|---|---|
LLM_PROVIDER |
groq |
Primary provider; failover walks the configured set. |
GROQ_MODEL / OPENROUTER_MODEL / OPENAI_MODEL |
provider defaults | Per-provider model id. |
LLM_REQUEST_TIMEOUT_SEC |
25.0 |
Per-call timeout for both completion and embedding. |
LLM_RETRY_ATTEMPTS |
2 |
Bounded retries per provider before failover. |
EMBEDDING_PROVIDER |
openai |
Cross-provider failover (openai / openrouter / cohere). |
OPENAI_EMBEDDING_MODEL / OPENROUTER_EMBEDDING_MODEL / COHERE_EMBEDDING_MODEL |
model ids | Embedding models per provider. |
EMBEDDING_DIM |
1536 |
Dimensionality used for the dense Qdrant vector. |
| Variable | Default | Purpose |
|---|---|---|
VECTOR_DB |
qdrant |
Hybrid (qdrant) or dense-only (milvus). |
QDRANT_URL / QDRANT_API_KEY / QDRANT_COLLECTION |
local | Qdrant connection. |
MILVUS_URI / MILVUS_COLLECTION |
local | Milvus connection. |
RELATIONAL_DB |
postgres |
Backing relational store. |
DATABASE_URL / POSTGRES_URL / MYSQL_URL |
— | Async DSNs (postgres:// is auto-normalised). |
DB_SCHEMA |
public |
Schema namespace for shared databases. |
| Variable | Default | Purpose |
|---|---|---|
MAX_CHUNKS_RETURN |
20 |
Initial recall budget before rerank. |
RERANK_TOP_N |
10 |
Chunks kept after rerank. |
CONFIDENCE_THRESHOLD |
0.40 |
Minimum calibrated confidence to return an answer. |
RELEVANCE_MIN_TOP_SCORE |
0.22 |
Top-1 evidence floor (gate refuses below this). |
RELEVANCE_MIN_SECOND_SCORE |
0.12 |
Top-2 evidence floor for support. |
RELEVANCE_MIN_SCORE_GAP |
0.03 |
Minimum top-1 vs top-2 margin. |
MIN_FALLBACK_OVERLAP |
0.20 |
Minimum answer↔context Jaccard for extractive fallback. |
EXTRACTIVE_FALLBACK_CONFIDENCE |
0.45 |
Confidence assigned when fallback succeeds. |
| Variable | Default | Purpose |
|---|---|---|
HYBRID_RRF_K |
60 |
RRF fusion constant for server-side fusion. |
HYBRID_DENSE_LIMIT |
50 |
Dense prefetch limit before fusion. |
HYBRID_SPARSE_LIMIT |
50 |
Sparse prefetch limit before fusion. |
SPARSE_INDEX_DIR |
data/sparse_index |
Where the BM25 corpus statistics are persisted. |
| Variable | Default | Purpose |
|---|---|---|
MMR_ENABLED |
true |
Enable MMR after rerank. |
MMR_LAMBDA |
0.7 |
Relevance vs novelty trade-off (1.0 = pure relevance). |
MAX_PER_DOC |
2 |
Hard cap on chunks from a single document post-MMR. |
INCLUDE_CITATIONS_DEFAULT |
false |
Server default if request omits include_citations. |
QUERY_REWRITE_ENABLED |
false |
Feature flag for LLM-driven query rewriting. |
QUERY_REWRITE_TRIGGER_SCORE |
0.40 |
Top-score threshold below which a rewrite is attempted. |
INTENT_TROUBLESHOOT_TOP_K_BOOST |
6 |
Extra recall when router detects troubleshooting intent. |
INTENT_HOWTO_TOP_K_BOOST |
4 |
Extra recall for "how to" questions. |
| Variable | Default | Purpose |
|---|---|---|
CHUNK_SIZE / CHUNK_OVERLAP |
512 / 64 |
Recursive splitter targets. |
EMBED_BATCH_SIZE |
32 |
Embedding batch size during ingestion. |
HEADING_AWARE_CHUNKING |
true |
Keep headings attached to their first paragraph. |
OCR_ENABLED / OCR_MODE |
true / hybrid |
Tesseract, Vision, or hybrid OCR. |
OCR_VISION_FALLBACK_ENABLED |
false |
Use Vision LLM only on low-confidence Tesseract pages. |
PDF_STORAGE_BACKEND |
relational |
Where original PDFs are kept (relational or vector). |
| Variable | Default | Purpose |
|---|---|---|
QUERY_STREAM_CHUNK_SIZE |
40 |
Char budget per delta event when synthesising streams. |
CHAT_HISTORY_TURNS |
5 |
Recent turns surfaced in the chat prompt. |
MAX_SESSIONS |
100 |
In-memory session ceiling. |
CORS_ALLOW_ORIGINS |
localhost |
Comma-separated origins allowed by FastAPI. |
| Variable | Default | Purpose |
|---|---|---|
RATE_LIMIT_PER_IP_PER_MIN |
60 |
Per-IP sliding-window cap on /query + /chat. |
RATE_LIMIT_PER_COOKIE_PER_MIN |
600 |
Per-cookie sliding-window cap. Both scopes are evaluated on every request. |
INGEST_REDACT_PII |
false |
Toggle the deterministic PII scrub at ingest. |
INGEST_REDACT_BACKEND |
regex |
regex (built-in) or presidio (opt-in extra) — combined when set to presidio. |
WEBHOOKS_ENABLED |
true |
Global gate for outbound webhook dispatch. The /webhooks/{id}/test endpoint always delivers regardless of this flag so operators can validate signatures. |
WEBHOOKS_TIMEOUT_SEC |
5.0 |
Per-call timeout for webhook deliveries. |
| Variable | Default | Purpose |
|---|---|---|
AUTO_SUMMARY_ENABLED |
true |
One LLM call after ingest writes a 2-3 sentence abstract into documents.summary. |
ANSWER_CACHE_ENABLED |
true |
LRU answer cache keyed by (question, top_k, service_category, corpus_version). |
ANSWER_CACHE_BACKEND |
memory |
memory (LRU) or redis (shared, multi-worker). |
ANSWER_CACHE_MAXSIZE / ANSWER_CACHE_TTL_SEC |
256 / 600 |
Cache sizing knobs. |
HYDE_ENABLED / MULTI_QUERY_ENABLED |
false / false |
Hypothetical-document and multi-query expansion. |
CHAT_COREFERENCE_REWRITE |
false |
Resolves "it"/"that" follow-ups into a standalone search query. |
ANSWER_VERIFIER_ENABLED |
false |
Optional second LLM pass that scores groundedness and regenerates once when weak. |
Re-uploading a file with the same original name (e.g. Onboarding.pdf) increments documents.version rather than creating a duplicate row. The pipeline keys version lookup off the user-facing filename — never the temp upload path — so a document at /tmp/{task_id}_Onboarding.pdf is still tracked as Onboarding.pdf in the relational store and Qdrant payload. Vector points carry the canonical pdf_name, and the documents page shows a v2, v3, … chip when version > 1.
Set INGEST_REDACT_PII=1 to scrub PII before embedding. The default backend (regex) catches:
EMAIL, URL, IPV4, CREDIT_CARD, AADHAAR, PAN, SSN, PHONE, DATE_OF_BIRTH.
Each match is replaced with a stable [REDACTED_*] token so vectors, BM25 indices, citations and source previews never see the original value. With INGEST_REDACT_BACKEND=presidio (after pip install -e ".[pii]") the regex result is unioned with Presidio's analyzer for broader recall. Aggregate counts per entity are surfaced via the ingestion.redacted structured-log line.
HTTP/1.1 429 Too Many Requests
Retry-After: 49
X-RateLimit-Scope: cookie
Content-Type: application/json
{"detail":{"message":"Too many requests. Please slow down.","scope":"cookie","retry_after_seconds":49}}The UI's RateLimitError reads Retry-After, X-RateLimit-Scope, and the structured detail to render a toast such as "Too many requests. Please slow down. — Try again in 49s." The IP scope is enforced first, then the cookie scope, so neither dimension can be bypassed.
Subscriptions are persisted in the relational DB (webhook_subscriptions table) and managed entirely through the API:
# Create
curl -X POST http://localhost:8000/webhooks \
-H 'Content-Type: application/json' \
-d '{"event":"ingestion.complete","url":"https://hooks.example/it","secret":"s3cr3t"}'
# List
curl http://localhost:8000/webhooks
# Test (always delivers, regardless of "enabled")
curl -X POST http://localhost:8000/webhooks/<id>/test
# Patch / Delete
curl -X PATCH http://localhost:8000/webhooks/<id> -d '{"enabled":false}' -H 'Content-Type: application/json'
curl -X DELETE http://localhost:8000/webhooks/<id>Every delivery includes an X-Helpdesk-Event header and, when a secret is configured, an X-Helpdesk-Signature (HMAC-SHA256). Valid event types are listed in WEBHOOK_EVENTS: ingestion.complete, query.completed, query.refused.
init_db() (run on every startup) executes an idempotent migration step that adds new columns when an older schema is encountered:
| Table | Columns added on demand |
|---|---|
documents |
summary, tags, version |
chat_sessions |
title, parent_session_id, parent_turn_id |
This means upgrading the application against an existing database does not require a manual ALTER TABLE step — the columns appear automatically the first time a newer build boots. Schema-level migrations that rename / drop / re-type columns still go through scripts/migrate_*.py.
python -m app.cli ingest path/to/file.pdf --service-name "IT Helpdesk"
python -m app.cli query "How do I reset the VPN?" --include-citations --top-k 8
python -m app.cli export-corpus dump.jsonl
python -m app.cli clear-cache
python -m app.cli eval --golden tests/data/golden.jsonlEach subcommand imports lazily so cold-start cost matches what the subcommand actually needs.
Regenerate from the live FastAPI OpenAPI schema:
# In-process generation (default)
python -m scripts.gen_sdk
# Or against a running server
python -m scripts.gen_sdk --url http://localhost:8000/openapi.jsonOutputs:
- Python:
app/sdk/python/{__init__.py,client.py}—HelpdeskClientwraps every route on top ofhttpx, accepts a cookie string for auth. - TypeScript:
helpdesk-ui/src/lib/sdk/{client.ts,models.ts,index.ts}—createHelpdeskClient({ baseUrl, cookie })returns a typed object whose method names match the FastAPIoperationIds.
Both clients are checked into the repo so consumers do not need a build step; CI runs the generator on schema changes.
Some manuals and reports are scanned images rather than digital text. The ingestion pipeline handles these transparently:
- Detection: page-text density inspection with PyMuPDF; PDFs are classified as
text_pdf,image_pdf, ormixed_pdfand routed page-by-page. Standalone image files (PNG/JPG/WEBP/TIFF/BMP) flow through the same OCR pipeline via theimageextractor. - OCR pipeline:
- Primary: local Tesseract OCR (multilingual via
OCR_LANGUAGES). - Optional: Vision-LLM fallback for low-confidence pages.
- Returns the same
ParsedPagemodel used by the regular parser, so chunking and embedding remain unchanged downstream.
- Primary: local Tesseract OCR (multilingual via
Behaviour summary:
- With
OCR_VISION_FALLBACK_ENABLED=false(default), only Tesseract is used. - With
true, Vision OCR is used only for low-confidence OCR pages. - If enabled but no valid Vision API key exists, ingestion falls back to Tesseract-only without failing.
Drop scanned PDFs and images into the upload zone in the UI, or POST them to /ingest.
helpdesk-ui/ # Next.js frontend
src/
app/
page.tsx # Landing
sign-in/, sign-up/ # Auth shell
app/{query,chat,documents,status}/ # In-app surfaces
layout.tsx, globals.css
components/
answer/ # Markdown, code, charts, citations
app/ # Sidebar, topbar, command palette, onboarding
auth/ # Auth shell + forms
chat/ # Composer, sessions rail, message bubble
documents/ # Upload zone, doc card, inspect sheet
landing/ # Hero, features, demo card, testimonial
query/ # One-shot query surface helpers
status/ # Status tiles
ui/ # Primitives (button, dialog, dropdown, ...)
lib/ # auth, motion, utils, api client
middleware.ts # Auth-cookie redirect for /app/*
app/
api/ # FastAPI routes and SSE endpoints
chat/ # Chat pipeline and session handling
config.py # Centralised settings (Pydantic-based)
db/ # Relational + vector store integrations
ingestion/
router.py # Format dispatcher
extractors/ # pdf, docx, spreadsheet, pptx, text, image
pdf_parser.py, ocr_parser.py # PDF + OCR specifics
chunker.py, sparse.py, pipeline.py
llm/ # LLM + embeddings client with provider failover
models/ # Pydantic models / schemas
query/ # Hybrid search, gates, diversify, rerank, rewrite, RAG, pipelines
storage/ # Original-document storage backends
main.py # FastAPI application entrypoint
docs/
reference/MANUAL.md # Detailed system manual and API behaviour
reference/sampleqna.md # Sample Q&A
diagram-v4.html # Interactive architecture diagram
tests/
unit/ # Isolated unit tests
integration/ # API and boundary tests
e2e/ # End-to-end workflow tests (mocked + optional live)
scripts/
bootstrap_start.py # Render entrypoint (init schema + collection + uvicorn)
init_db.py # Relational DB initialisation
init_vector_db.py # Vector DB initialisation
seed_demo.py # Demo document ingestion helper
migrate_qdrant_hybrid.py # One-shot migration to named dense + sparse Qdrant collection
data/
sample_pdfs/ # Example documents for demo and testing
START_HERE.md # Single AI-agent bootstrap prompt
pytest -q— runs unit, integration, and mocked E2E suites.RUN_LIVE_E2E=1 pytest tests/e2e/test_live_e2e.py -q— exercises the live ingest → query path against real Qdrant + provider keys (cleanly skipped if Qdrant or keys are unavailable).ruff check .andmypy app/keep the codebase lint- and type-clean.npm run lintandnpm run buildfromhelpdesk-ui/keep the frontend lint- and type-clean.
MIT (adjust if you publish under a different license).
