An AI-powered PDF ChatBot with a dark chat UI. Upload any PDF, ask questions in a conversational interface, and get grounded answers powered by Llama 3.3 70B via OpenRouter.
Built as a portfolio project - fast, lightweight, zero local GPU required.
- Upload & parse any PDF with PyMuPDF
- Smart text chunking via LangChain
- Semantic embeddings using
all-MiniLM-L6-v2 - Vector storage & retrieval with Pinecone (serverless)
- Answers from Llama 3.3 70B via OpenRouter (free tier)
- Persistent chat history with full conversation UI
- Dark GitHub-inspired theme
- Auto-retry on rate limits, batch upsert for large PDFs
PDF Upload
|
v
PyMuPDF -- extract raw text
|
v
LangChain RecursiveCharacterTextSplitter -- chunk (500 tokens, 50 overlap)
|
v
SentenceTransformers all-MiniLM-L6-v2 -- embed chunks -> 384-dim vectors
|
v
Pinecone Serverless -- store & query vectors (top-5 retrieval)
|
v
OpenRouter -> Llama 3.3 70B Instruct -- generate grounded answer
|
v
Streamlit Dark Chat UI -- display in conversation thread
| Layer | Technology |
|---|---|
| Frontend | Streamlit with custom dark CSS |
| PDF Parsing | PyMuPDF |
| Text Splitting | LangChain RecursiveCharacterTextSplitter |
| Embeddings | SentenceTransformers all-MiniLM-L6-v2 |
| Vector DB | Pinecone Serverless (free tier) |
| LLM | Llama 3.3 70B via OpenRouter (free) |
| API Client | OpenAI Python SDK |
app/
├── app.py # Streamlit UI entry point
├── config.py # All constants (model names, chunk sizes, etc.)
├── pdf_utils.py # PDF extraction and text chunking
├── embedder.py # SentenceTransformer loading and vector encoding
├── vector_store.py # Pinecone connect, upsert, and query
├── llm.py # OpenRouter API call with retry logic
├── .streamlit/
│ └── config.toml # Dark theme + server config for Streamlit Cloud
├── .env.example # Template -- copy to .env and fill in keys
├── .gitignore # Excludes .env, .venv, temp.pdf, __pycache__
├── requirements.txt # Minimal direct dependencies (7 packages)
└── README.md # This file
git clone https://github.com/RohitDSonawane/PDFChatBot.git
cd PDFChatBotpython -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activatepip install -r requirements.txtcp .env.example .envThen edit .env and add your keys:
PINECONE_API_KEY=your-pinecone-api-key
OPENROUTER_API_KEY=your-openrouter-api-key- Get Pinecone key: app.pinecone.io -> API Keys
- Get OpenRouter key (free): openrouter.ai -> Keys
streamlit run app.py- Push this repo to GitHub (public)
- Go to share.streamlit.io -> sign in with GitHub
- Click New app -> select this repo -> set
app.pyas entry point - Go to Advanced settings -> Secrets and add:
PINECONE_API_KEY = "your-pinecone-api-key"
OPENROUTER_API_KEY = "your-openrouter-api-key"- Click Deploy -- live in ~2 minutes
Rohit Sonawane
Star this repo if you found it useful! Feedback, issues, and PRs are welcome.