A lightweight Agentic RAG (Retrieval-Augmented Generation) system built from scratch with LangGraph and FAISS.
Instead of a fixed retrieve → generate pipeline, this agent reasons about its own retrieval:
User Query
↓
Query Rewriter — makes vague questions retrieval-friendly
↓
Retriever — fetches top-k chunks from FAISS
↓
Relevance Grader — keeps only useful chunks
↓ (relevant found) ↓ (none found, retry allowed)
Generator Query Rewriter ←─ loop
↓
Final Answer
| Feature | Detail |
|---|---|
| Self-correcting retrieval | If the first retrieval returns irrelevant chunks, the agent rewrites the query and tries again (max 2 retries) |
| Conversation memory | History is passed to every node so follow-up questions work naturally |
| Relevance grading | An LLM grades each chunk before it reaches the generator — no noisy context |
| Local embeddings | all-MiniLM-L6-v2 via HuggingFace — no embedding API calls needed |
| Swappable LLM | Works with OpenAI, Groq (free), or Ollama (fully local) |
| Gradio UI | One-command chat interface at http://localhost:7860 |
git clone https://github.com/<your-username>/agentic-rag-agent.git
cd agentic-rag-agent
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Edit .env and add your OPENAI_API_KEY (or Groq/Ollama — see .env.example)Drop .pdf, .txt, or .md files into data/sample_docs/.
Two sample documents about RAG and LangGraph are already included for testing.
python ingest.pyThis chunks your documents, embeds them, and saves a FAISS index to data/faiss_index/.
Only needs to run once (or again if you add new documents with --force).
python app.pyOpen http://localhost:7860 and start chatting.
agentic-rag-agent/
│
├── app.py # Gradio chat UI — entry point
├── ingest.py # CLI script to build the FAISS index
├── requirements.txt
├── .env.example # Copy to .env and fill in your keys
│
├── src/
│ ├── rag_agent.py # LangGraph graph definition + node functions
│ ├── retriever.py # FAISS ingestion & retrieval
│ ├── state.py # RAGState TypedDict
│ └── prompts.py # All prompt templates in one place
│
└── data/
├── sample_docs/ # Put your source documents here
│ ├── intro_to_rag.md
│ └── langgraph_guide.md
└── faiss_index/ # Auto-generated by ingest.py (git-ignored)
Edit .env to switch providers — no code changes needed.
OpenAI (default)
OPENAI_API_KEY=sk-...
LLM_MODEL=gpt-4o-miniGroq (free tier, fast)
OPENAI_API_KEY=gsk_...
OPENAI_BASE_URL=https://api.groq.com/openai/v1
LLM_MODEL=llama-3.3-70b-versatileOllama (fully local, no API key)
OPENAI_API_KEY=ollama
OPENAI_BASE_URL=http://localhost:11434/v1
LLM_MODEL=qwen2.5:7bThe agent is defined in src/rag_agent.py as a StateGraph.
┌──────────────┐
START ───► │ query_rewriter│ ◄──── (retry loop)
└──────┬───────┘
│
┌──────▼───────┐
│ retriever │
└──────┬───────┘
│
┌──────▼───────┐
│ relevance │
│ check │
└──────┬───────┘
│
┌──────────┴──────────┐
relevant? not relevant?
│ │
┌──────▼───────┐ (rewrites < 2)
│ generator │ │
└──────┬───────┘ back to rewriter
│
END
Each node is a plain Python function that receives the shared RAGState and returns a partial update.
Conditional routing (route_after_relevance) decides whether to generate or retry.
All tunable parameters live in .env:
| Variable | Default | Description |
|---|---|---|
LLM_MODEL |
gpt-4o-mini |
LLM model name |
EMBED_MODEL |
all-MiniLM-L6-v2 |
HuggingFace embedding model |
TOP_K |
4 |
Number of chunks to retrieve |
CHUNK_SIZE |
512 |
Tokens per chunk |
CHUNK_OVERLAP |
64 |
Overlap between consecutive chunks |
DATA_DIR |
data/sample_docs |
Directory of source documents |
INDEX_DIR |
data/faiss_index |
Where the FAISS index is saved |
- LangGraph — graph-based agent orchestration
- LangChain — LLM abstraction, document loaders
- FAISS — local vector similarity search
- HuggingFace Sentence Transformers — local embeddings
- Gradio — chat UI
MIT — free to use, modify, and distribute.
Built by Vamshi as part of a hands-on AI engineering portfolio.