Skip to content

MrExtinct27/DMV-manual-RAG

Repository files navigation

USA DMV RAG (Driver Manual Assistant)

A production-grade Retrieval-Augmented Generation (RAG) chatbot that answers driving rules questions using official DMV manuals from all 50 US states + DC. Built with hybrid vector search, cross-encoder reranking, Redis caching, and Llama 4 via Groq.


System Architecture

diagram-export-08-03-2026-19_36_27

Project Demo Image

Screenshot 2026-03-08 at 7 47 59 PM

Project Flow

User Query
    │
    ▼
State Detection          ← Detects state from query text (e.g. "California" → "CA")
    │
    ▼
Redis Cache Check        ← Return instantly if same query was asked before
    │ (cache miss)
    ▼
Hybrid Search            ← Dense vector search (Qdrant) + Keyword search, merged via RRF
    │
    ▼
S3 Hydration             ← Fetch full chunk text from S3 (Qdrant stores only previews)
    │
    ▼
Reranking                ← Cross-encoder scores each chunk for true relevance
    │
    ▼
LLM Generation           ← Groq (Llama 4) generates cited answer from top 5 chunks
    │
    ▼
Cache Write + Response   ← Store in Redis, return answer with sources to user

Key Concepts

Embeddings

Text chunks are converted into 1024-dimensional numerical vectors using BAAI/bge-large-en-v1.5. Semantically similar text produces numerically close vectors, enabling meaning-based search rather than keyword matching.

Qdrant (Vector Database)

Stores all 7,644 chunk vectors and indexes them using HNSW graphs for millisecond similarity search. When a user asks a question, the query is embedded and Qdrant finds the closest matching vectors filtered by state.

Hybrid Search

Combines dense vector search (semantic similarity via Qdrant) with keyword search (term frequency over chunk previews), then merges both result lists using Reciprocal Rank Fusion (RRF). This catches cases where one method misses what the other finds.

Reciprocal Rank Fusion (RRF)

A rank merging algorithm that scores each chunk as 1 / (60 + rank) from both search methods and sums the scores. Chunks appearing highly ranked in both searches get the highest combined score.

Reranking

A cross-encoder model (BAAI/bge-reranker-v2-m3) scores each query-chunk pair jointly for true relevance. Unlike embeddings which encode query and chunk separately, the cross-encoder reads both together and produces a more accurate relevance score.

Prompt Engineering

The system prompt instructs the LLM to only use provided context, always cite sources as (STATE Manual, Page X), and explicitly say when it cannot find an answer rather than hallucinating. The RAG prompt formats each chunk with its source header before sending to the LLM.

Redis Caching (Two-tier)

Redis stores two types of data: LLM responses (keyed by MD5 hash of query + state, TTL 24h) and hot chunks promoted from S3 (chunks accessed 10+ times get cached for 7 days). This prevents redundant LLM calls and reduces S3 fetch latency.

4-Tier Storage

Tier Storage What Lives Here
1 Redis Cached LLM responses + hot chunks
2 Qdrant Vectors + metadata + text previews
3 S3 Standard Full chunk JSON files
4 S3 Glacier Original PDF manuals (cold archive)

Project Structure

dmv-rag/
├── ingestion/
│   ├── download_pdfs.py      # Downloads all 51 state manuals
│   ├── parse_pdfs.py         # Extracts text via PyMuPDF + pdfplumber
│   ├── chunk.py              # Splits pages into 512-token chunks
│   └── embed_and_upload.py   # Embeds chunks, uploads to Qdrant + S3
│
├── retrieval/
│   ├── state_detector.py     # Detects US state from query text
│   ├── hybrid_search.py      # Dense + keyword search with RRF fusion
│   └── reranker.py           # Cross-encoder reranking via HF API
│
├── generation/
│   ├── prompt.py             # System prompt + RAG prompt builder
│   └── llm.py                # Groq (primary) + OpenAI (fallback) caller
│
├── cache/
│   ├── redis_cache.py        # LLM response cache
│   └── tier_manager.py       # S3 fetcher with Redis promotion
│
├── api/
│   └── main.py               # FastAPI app with /query, /health, /cache/stats
│
├── frontend/
│   └── app.py                # Streamlit chatbot UI
│
├── data/                     # Local data (gitignored)
│   ├── pdfs/                 # Downloaded state manuals
│   ├── parsed/               # Extracted text JSON
│   └── chunks/               # Chunked text JSON
│
├── requirements.txt
└── .env

Tech Stack

Component Technology
LLM Groq — Llama 4 Scout 17B
Embeddings BAAI/bge-large-en-v1.5 via HF Inference API
Reranker BAAI/bge-reranker-v2-m3 via HF Inference API
Vector DB Qdrant (Docker)
Cache Redis 7 (Docker)
Object Storage AWS S3
PDF Parsing PyMuPDF + pdfplumber
Chunking LangChain RecursiveCharacterTextSplitter + tiktoken
API FastAPI + uvicorn
Frontend Streamlit

Setup

Prerequisites

  • Python 3.10
  • Docker Desktop
  • AWS account with S3 bucket
  • Groq API key (free at console.groq.com)
  • HuggingFace API token (free at huggingface.co/settings/tokens)

Environment Variables

Create a .env file in the project root:

# HuggingFace
HF_API_TOKEN=hf_xxx
EMBED_MODEL=BAAI/bge-large-en-v1.5
RERANKER_MODEL=BAAI/bge-reranker-v2-m3

# Qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=dmv_manuals

# AWS
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1
S3_CHUNKS_BUCKET=your-chunks-bucket
S3_PDFS_BUCKET=your-pdfs-bucket

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_TTL_SECONDS=86400
REDIS_PROMOTE_THRESHOLD=10

# LLM
GROQ_API_KEY=xxx
OPENAI_API_KEY=xxx
LLM_MODEL=meta-llama/llama-4-scout-17b-16e-instruct

# Search
TOP_K_RETRIEVE=20
TOP_K_RERANK=5

Installation

# Clone and setup
git clone https://github.com/yourusername/dmv-rag.git
cd dmv-rag
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Start Docker services
docker start qdrant redis
# or first time:
docker run -d --name qdrant -p 6333:6333 -v ~/qdrant_data:/qdrant/storage qdrant/qdrant
docker run -d --name redis -p 6379:6379 redis:7-alpine

Data Ingestion (first time only)

python ingestion/download_pdfs.py    # Download 51 state PDFs
python ingestion/parse_pdfs.py       # Extract text
python ingestion/chunk.py            # Split into chunks
python ingestion/embed_and_upload.py # Embed + upload to Qdrant + S3

Running the App

# Terminal 1 — API
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2 — Frontend
streamlit run frontend/app.py

Open http://localhost:8501


API Endpoints

Method Endpoint Description
GET /health Health check + collection name
GET /cache/stats Redis cache statistics
POST /query Main RAG query endpoint

Example Query

curl -X POST http://localhost:8000/query \
  -H 'Content-Type: application/json' \
  -d '{"query": "What is the speed limit in school zones in California?"}'
{
  "answer": "The speed limit in school zones in California is 25 mph...",
  "state_detected": "CA",
  "sources": [{"state": "CA", "page": 73, "rerank_score": 0.424}],
  "cached": false,
  "model_used": "meta-llama/llama-4-scout-17b-16e-instruct"
}

Data Stats

  • 51 state driver manuals (50 states + DC)
  • ~4,200 pages of text extracted
  • 7,644 chunks at 512 tokens each
  • 1024-dimensional embeddings

Comparative Analysis

The project includes a pre-computed analysis module that scores and ranks all 51 state DMV manuals across three dimensions.

Scoring Dimensions

Dimension Weight What It Measures
Content Depth 20% avg words per page, total pages, image-only pages
Readability 20% Flesch Reading Ease, FK Grade, Gunning Fog, SMOG
Topic Coverage 60% keyword frequency across 7 categories

Topic Categories

Test Prep · Safety · Legal Compliance · Teen Rules · Emergency · Registration · Commercial

Weighted Score Formula

weighted_score = (0.20 × content_depth) + (0.20 × readability) + (0.60 × topic_coverage)

Cross-State Inconsistency Detection

10 standard questions (BAC limits, school zone speeds, minimum age, DUI penalties, etc.) are queried against all 51 states via the RAG pipeline. Reranker confidence scores surface which states document each topic clearly and which have coverage gaps.

Running the Analysis

# Pre-compute stats (run once)
python -m analysis.compute_stats
python -m analysis.cross_state_compare

# View dashboard
streamlit run frontend/app.py
# Navigate to "Manual Comparison" in the sidebar

Credits

Developed by Yash Mahajan Under the guidance of Dr. Rakesh Mahto and Dr. Deepak Sharma

About

A production RAG chatbot that answers US driving rules questions using official DMV manuals from all 51 states. Built with hybrid vector search (Qdrant), cross-encoder reranking, Redis caching, and Llama 4 via Groq. Features a Streamlit UI with streaming responses.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages