USA DMV RAG (Driver Manual Assistant)

A production-grade Retrieval-Augmented Generation (RAG) chatbot that answers driving rules questions using official DMV manuals from all 50 US states + DC. Built with hybrid vector search, cross-encoder reranking, Redis caching, and Llama 4 via Groq.

System Architecture

Project Demo Image

Project Flow

User Query
    │
    ▼
State Detection          ← Detects state from query text (e.g. "California" → "CA")
    │
    ▼
Redis Cache Check        ← Return instantly if same query was asked before
    │ (cache miss)
    ▼
Hybrid Search            ← Dense vector search (Qdrant) + Keyword search, merged via RRF
    │
    ▼
S3 Hydration             ← Fetch full chunk text from S3 (Qdrant stores only previews)
    │
    ▼
Reranking                ← Cross-encoder scores each chunk for true relevance
    │
    ▼
LLM Generation           ← Groq (Llama 4) generates cited answer from top 5 chunks
    │
    ▼
Cache Write + Response   ← Store in Redis, return answer with sources to user

Key Concepts

Embeddings

Text chunks are converted into 1024-dimensional numerical vectors using BAAI/bge-large-en-v1.5. Semantically similar text produces numerically close vectors, enabling meaning-based search rather than keyword matching.

Qdrant (Vector Database)

Stores all 7,644 chunk vectors and indexes them using HNSW graphs for millisecond similarity search. When a user asks a question, the query is embedded and Qdrant finds the closest matching vectors filtered by state.

Hybrid Search

Combines dense vector search (semantic similarity via Qdrant) with keyword search (term frequency over chunk previews), then merges both result lists using Reciprocal Rank Fusion (RRF). This catches cases where one method misses what the other finds.

Reciprocal Rank Fusion (RRF)

A rank merging algorithm that scores each chunk as 1 / (60 + rank) from both search methods and sums the scores. Chunks appearing highly ranked in both searches get the highest combined score.

Reranking

A cross-encoder model (BAAI/bge-reranker-v2-m3) scores each query-chunk pair jointly for true relevance. Unlike embeddings which encode query and chunk separately, the cross-encoder reads both together and produces a more accurate relevance score.

Prompt Engineering

The system prompt instructs the LLM to only use provided context, always cite sources as (STATE Manual, Page X), and explicitly say when it cannot find an answer rather than hallucinating. The RAG prompt formats each chunk with its source header before sending to the LLM.

Redis Caching (Two-tier)

Redis stores two types of data: LLM responses (keyed by MD5 hash of query + state, TTL 24h) and hot chunks promoted from S3 (chunks accessed 10+ times get cached for 7 days). This prevents redundant LLM calls and reduces S3 fetch latency.

4-Tier Storage

Tier	Storage	What Lives Here
1	Redis	Cached LLM responses + hot chunks
2	Qdrant	Vectors + metadata + text previews
3	S3 Standard	Full chunk JSON files
4	S3 Glacier	Original PDF manuals (cold archive)

Project Structure

dmv-rag/
├── ingestion/
│   ├── download_pdfs.py      # Downloads all 51 state manuals
│   ├── parse_pdfs.py         # Extracts text via PyMuPDF + pdfplumber
│   ├── chunk.py              # Splits pages into 512-token chunks
│   └── embed_and_upload.py   # Embeds chunks, uploads to Qdrant + S3
│
├── retrieval/
│   ├── state_detector.py     # Detects US state from query text
│   ├── hybrid_search.py      # Dense + keyword search with RRF fusion
│   └── reranker.py           # Cross-encoder reranking via HF API
│
├── generation/
│   ├── prompt.py             # System prompt + RAG prompt builder
│   └── llm.py                # Groq (primary) + OpenAI (fallback) caller
│
├── cache/
│   ├── redis_cache.py        # LLM response cache
│   └── tier_manager.py       # S3 fetcher with Redis promotion
│
├── api/
│   └── main.py               # FastAPI app with /query, /health, /cache/stats
│
├── frontend/
│   └── app.py                # Streamlit chatbot UI
│
├── data/                     # Local data (gitignored)
│   ├── pdfs/                 # Downloaded state manuals
│   ├── parsed/               # Extracted text JSON
│   └── chunks/               # Chunked text JSON
│
├── requirements.txt
└── .env

Tech Stack

Component	Technology
LLM	Groq — Llama 4 Scout 17B
Embeddings	BAAI/bge-large-en-v1.5 via HF Inference API
Reranker	BAAI/bge-reranker-v2-m3 via HF Inference API
Vector DB	Qdrant (Docker)
Cache	Redis 7 (Docker)
Object Storage	AWS S3
PDF Parsing	PyMuPDF + pdfplumber
Chunking	LangChain RecursiveCharacterTextSplitter + tiktoken
API	FastAPI + uvicorn
Frontend	Streamlit

Setup

Prerequisites

Python 3.10
Docker Desktop
AWS account with S3 bucket
Groq API key (free at console.groq.com)
HuggingFace API token (free at huggingface.co/settings/tokens)

Environment Variables

Create a .env file in the project root:

# HuggingFace
HF_API_TOKEN=hf_xxx
EMBED_MODEL=BAAI/bge-large-en-v1.5
RERANKER_MODEL=BAAI/bge-reranker-v2-m3

# Qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333
QDRANT_COLLECTION=dmv_manuals

# AWS
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1
S3_CHUNKS_BUCKET=your-chunks-bucket
S3_PDFS_BUCKET=your-pdfs-bucket

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_TTL_SECONDS=86400
REDIS_PROMOTE_THRESHOLD=10

# LLM
GROQ_API_KEY=xxx
OPENAI_API_KEY=xxx
LLM_MODEL=meta-llama/llama-4-scout-17b-16e-instruct

# Search
TOP_K_RETRIEVE=20
TOP_K_RERANK=5

Installation

# Clone and setup
git clone https://github.com/yourusername/dmv-rag.git
cd dmv-rag
python3.10 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Start Docker services
docker start qdrant redis
# or first time:
docker run -d --name qdrant -p 6333:6333 -v ~/qdrant_data:/qdrant/storage qdrant/qdrant
docker run -d --name redis -p 6379:6379 redis:7-alpine

Data Ingestion (first time only)

python ingestion/download_pdfs.py    # Download 51 state PDFs
python ingestion/parse_pdfs.py       # Extract text
python ingestion/chunk.py            # Split into chunks
python ingestion/embed_and_upload.py # Embed + upload to Qdrant + S3

Running the App

# Terminal 1 — API
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

# Terminal 2 — Frontend
streamlit run frontend/app.py

Open http://localhost:8501

API Endpoints

Method	Endpoint	Description
GET	`/health`	Health check + collection name
GET	`/cache/stats`	Redis cache statistics
POST	`/query`	Main RAG query endpoint

Example Query

curl -X POST http://localhost:8000/query \
  -H 'Content-Type: application/json' \
  -d '{"query": "What is the speed limit in school zones in California?"}'

{
  "answer": "The speed limit in school zones in California is 25 mph...",
  "state_detected": "CA",
  "sources": [{"state": "CA", "page": 73, "rerank_score": 0.424}],
  "cached": false,
  "model_used": "meta-llama/llama-4-scout-17b-16e-instruct"
}

Data Stats

51 state driver manuals (50 states + DC)
~4,200 pages of text extracted
7,644 chunks at 512 tokens each
1024-dimensional embeddings

Comparative Analysis

The project includes a pre-computed analysis module that scores and ranks all 51 state DMV manuals across three dimensions.

Scoring Dimensions

Dimension	Weight	What It Measures
Content Depth	20%	avg words per page, total pages, image-only pages
Readability	20%	Flesch Reading Ease, FK Grade, Gunning Fog, SMOG
Topic Coverage	60%	keyword frequency across 7 categories

Topic Categories

Test Prep · Safety · Legal Compliance · Teen Rules · Emergency · Registration · Commercial

Weighted Score Formula

weighted_score = (0.20 × content_depth) + (0.20 × readability) + (0.60 × topic_coverage)

Cross-State Inconsistency Detection

10 standard questions (BAC limits, school zone speeds, minimum age, DUI penalties, etc.) are queried against all 51 states via the RAG pipeline. Reranker confidence scores surface which states document each topic clearly and which have coverage gaps.

Running the Analysis

# Pre-compute stats (run once)
python -m analysis.compute_stats
python -m analysis.cross_state_compare

# View dashboard
streamlit run frontend/app.py
# Navigate to "Manual Comparison" in the sidebar

Credits

Developed by Yash Mahajan Under the guidance of Dr. Rakesh Mahto and Dr. Deepak Sharma

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analysis		analysis
api		api
cache		cache
data/analysis		data/analysis
frontend		frontend
generation		generation
ingestion		ingestion
retrieval		retrieval
.gitignore		.gitignore
README.md		README.md
glacier_lifecycle.json		glacier_lifecycle.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

USA DMV RAG (Driver Manual Assistant)

System Architecture

Project Demo Image

Project Flow

Key Concepts

Embeddings

Qdrant (Vector Database)

Hybrid Search

Reciprocal Rank Fusion (RRF)

Reranking

Prompt Engineering

Redis Caching (Two-tier)

4-Tier Storage

Project Structure

Tech Stack

Setup

Prerequisites

Environment Variables

Installation

Data Ingestion (first time only)

Running the App

API Endpoints

Example Query

Data Stats

Comparative Analysis

Scoring Dimensions

Topic Categories

Weighted Score Formula

Cross-State Inconsistency Detection

Running the Analysis

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages