FabIQ

Production multi-agent RAG system for technical documentation intelligence

Built to demonstrate production AI engineering discipline for the ASML AI Engineer role — covering every requirement in the JD: end-to-end RAG, multi-agent LangGraph orchestration, privilege-aware retrieval, LLM-as-judge evaluation, LangSmith observability, prompt versioning, and a complete operational runbook.

What it does

Engineers at semiconductor manufacturers spend 2–3 hours per shift searching thousands of pages of machine manuals, fab process specs, and compliance guidelines to answer precise technical questions. A wrong answer can stop a production line. FabIQ solves this with a 5-agent pipeline that retrieves with role-based access control, grounds every answer in citations, and continuously measures its own quality.

Architecture

User query
  → Agent 1: Query understanding  — classifies intent, rewrites for retrieval
  → Agent 2: Privilege check      — maps role → access filter (server-side RBAC)
  → Agent 3: Hybrid retrieval     — vector + BM25 via Azure AI Search
  → Agent 4: Citation grounding   — GPT-4o generates answer, every claim cited
  → Agent 5: LLM-as-judge eval    — Claude scores accuracy / grounding / completeness
  → HITL gate                     — if confidence < 0.60, routes to human review

Tech stack

Layer	Technology
LLM (generation)	Azure OpenAI GPT-4o
LLM (evaluation)	Anthropic Claude Sonnet — separate judge model
Vector store	Azure AI Search (hybrid vector + BM25)
Orchestration	LangGraph 5-agent state machine
Observability	LangSmith — every agent call traced
API	FastAPI (async, streaming, Pydantic v2)
Dashboard	Streamlit — query UI + live metrics
Prompt versioning	JSON config with version registry
CI/CD	GitHub Actions (lint → test → docker build → eval regression)
Containers	Docker multi-stage + docker-compose

Project structure

fabiq/
├── src/fabiq/
│   ├── agents/          # 5 LangGraph agents
│   ├── pipeline/        # graph.py + prompt_registry.py + prompts.json
│   ├── ingestion/       # loader, chunker (3 strategies), embedder
│   ├── retrieval/       # Azure AI Search hybrid search + RBAC filter
│   ├── api/             # FastAPI routes (/ingest, /query, /health)
│   └── observability/   # LangSmith tracing
├── dashboard/           # Streamlit observability dashboard
├── eval/                # Golden dataset (30 Q&A) + eval runner
├── tests/               # 65 passing tests (Day 1 + Day 2 coverage)
├── docs/
│   ├── ADR-001-chunking-strategy.md
│   ├── ADR-002-evaluation-framework.md
│   └── RUNBOOK.md
├── .github/workflows/   # CI pipeline
├── Dockerfile
└── docker-compose.yml

Quickstart

# 1. Clone and install
git clone https://github.com/YOUR_USERNAME/fabiq.git && cd fabiq
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure Azure credentials
cp .env.example .env
# Fill in AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_API_KEY,
#         AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_API_KEY
# Optional: ANTHROPIC_KEY (LLM-as-judge), LANGSMITH_API_KEY (tracing)

# 3. Run tests (no Azure needed)
PYTHONPATH=src:. pytest tests/ -v --no-cov    # → 65 passed

# 4. Validate eval dataset
PYTHONPATH=src:. python eval/run_eval.py --dry-run    # → 30 questions listed

# 5. Start the server
PYTHONPATH=src uvicorn fabiq.api.main:app --reload --port 8000

# 6. Start the dashboard
PYTHONPATH=src streamlit run dashboard/app.py

# 7. Run full eval (requires Azure + Anthropic credentials)
PYTHONPATH=src:. python eval/run_eval.py

Key design decisions

See docs/ADR-001-chunking-strategy.md — why recursive is the default over fixed or semantic.

See docs/ADR-002-evaluation-framework.md — why LLM-as-judge + golden dataset rather than human evaluation for every response; HITL threshold justification; cost model.

See docs/RUNBOOK.md — how to update a prompt safely, run eval regression, monitor in production, and handle incidents.

Evaluation

The 30-question golden dataset spans three tiers:

Tier 1 (10 questions): Factual lookups — single-source, specific answers
Tier 2 (10 questions): Procedural — step-based how-to questions
Tier 3 (10 questions): Multi-hop — require reasoning across multiple documents

Each question has expected keywords for lightweight lexical regression plus reference answers for the LLM-as-judge eval. Results are written to eval/results.jsonl with per-question scores for trend analysis.

Tests

65 passing tests covering:
  ├── 21 chunker tests   (fixed, recursive, semantic strategies)
  ├── 17 loader tests    (PDF, markdown, directory loading, RBAC metadata)
  ├── 7  search tests    (RBAC filter logic, OData syntax, SearchResult)
  ├── 13 agent tests     (all 5 agents with mocked Azure/Anthropic clients)
  └── 7  pipeline tests  (graph compilation, HITL routing, golden dataset)

Published research

LLM2Manim — arXiv preprint on converting natural-language STEM questions to animated visual explanations via LLM pipelines. Demonstrates prior production AI engineering work.

Built in 72 hours as a portfolio project demonstrating production AI engineering for the ASML AI Engineer role. Not affiliated with ASML or UTIS LLC.

Deployment modes: Azure production + local reviewer mode

FabIQ supports two execution modes so the Azure implementation can remain intact while reviewers can still run the project without cloud credentials.

1. Azure production mode

This is the intended production architecture:

Azure OpenAI for embeddings and chat generation
Azure AI Search for hybrid retrieval and RBAC filtering
LangGraph for the 5-agent orchestration pipeline
Optional Anthropic judge for evaluation
Optional LangSmith tracing

Use this mode when Azure credentials and model quota are available:

APP_MODE=azure
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_API_VERSION=2024-02-01
AZURE_OPENAI_EMBEDDING_DEPLOYMENT=text-embedding-3-small
AZURE_OPENAI_CHAT_DEPLOYMENT=gpt-4o-mini
AZURE_SEARCH_ENDPOINT=https://your-search.search.windows.net
AZURE_SEARCH_API_KEY=...
AZURE_SEARCH_INDEX_NAME=fabiq-docs

2. Local reviewer/demo mode

This mode does not require Azure login, Azure quota, Azure OpenAI, or Azure AI Search.

It demonstrates the core system flow:

document upload
chunking
local embeddings
local vector retrieval
role-based access filtering
citation formatting
FastAPI routes
dashboard integration

Local mode intentionally uses extractive/template-based answer generation instead of a cloud LLM. Full generative answer quality is available in Azure mode.

APP_MODE=local
LOCAL_INDEX_PATH=data/local_index.json
LOCAL_EMBEDDING_MODEL=hash

Optional: set LOCAL_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 for stronger local semantic retrieval if the model is installed/cached locally.

Run locally without Azure

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# keep APP_MODE=local
PYTHONPATH=src uvicorn fabiq.api.main:app --reload --port 8000

Check health:

curl http://localhost:8000/health

Ingest the included sample document:

curl -X POST "http://localhost:8000/ingest/" \
  -F "file=@sample_data/local_demo.md" \
  -F "access_level=public" \
  -F "chunk_strategy=recursive"

Query the local index:

curl -X POST "http://localhost:8000/query/" \
  -H "Content-Type: application/json" \
  -d '{"query":"What does local demo mode demonstrate?","role":"field_engineer"}'

Why local mode exists

FabIQ was implemented with Azure resources as the primary cloud architecture. Student/free Azure subscriptions can expire, lose quota, or restrict model deployment by region. Local mode allows reviewers to verify that the ingestion, retrieval, RBAC, citation, API, and dashboard workflow still runs without needing the author's Azure credentials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FabIQ

What it does

Architecture

Tech stack

Project structure

Quickstart

Key design decisions

Evaluation

Tests

Published research

Deployment modes: Azure production + local reviewer mode

1. Azure production mode

2. Local reviewer/demo mode

Run locally without Azure

Why local mode exists

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
dashboard		dashboard
docs		docs
eval		eval
sample_data		sample_data
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Output.mov		Output.mov
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

FabIQ

What it does

Architecture

Tech stack

Project structure

Quickstart

Key design decisions

Evaluation

Tests

Published research

Deployment modes: Azure production + local reviewer mode

1. Azure production mode

2. Local reviewer/demo mode

Run locally without Azure

Why local mode exists

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages