PDF ChatBot - Chat with Any PDF using AI

An AI-powered PDF ChatBot with a dark chat UI. Upload any PDF, ask questions in a conversational interface, and get grounded answers powered by Llama 3.3 70B via OpenRouter.

Built as a portfolio project - fast, lightweight, zero local GPU required.

Features

Upload & parse any PDF with PyMuPDF
Smart text chunking via LangChain
Semantic embeddings using all-MiniLM-L6-v2
Vector storage & retrieval with Pinecone (serverless)
Answers from Llama 3.3 70B via OpenRouter (free tier)
Persistent chat history with full conversation UI
Dark GitHub-inspired theme
Auto-retry on rate limits, batch upsert for large PDFs

Architecture

PDF Upload
    |
    v
PyMuPDF -- extract raw text
    |
    v
LangChain RecursiveCharacterTextSplitter -- chunk (500 tokens, 50 overlap)
    |
    v
SentenceTransformers all-MiniLM-L6-v2 -- embed chunks -> 384-dim vectors
    |
    v
Pinecone Serverless -- store & query vectors (top-5 retrieval)
    |
    v
OpenRouter -> Llama 3.3 70B Instruct -- generate grounded answer
    |
    v
Streamlit Dark Chat UI -- display in conversation thread

Tech Stack

Layer	Technology
Frontend	Streamlit with custom dark CSS
PDF Parsing	PyMuPDF
Text Splitting	LangChain RecursiveCharacterTextSplitter
Embeddings	SentenceTransformers all-MiniLM-L6-v2
Vector DB	Pinecone Serverless (free tier)
LLM	Llama 3.3 70B via OpenRouter (free)
API Client	OpenAI Python SDK

Project Structure

app/
├── app.py                  # Streamlit UI entry point
├── config.py               # All constants (model names, chunk sizes, etc.)
├── pdf_utils.py            # PDF extraction and text chunking
├── embedder.py             # SentenceTransformer loading and vector encoding
├── vector_store.py         # Pinecone connect, upsert, and query
├── llm.py                  # OpenRouter API call with retry logic
├── .streamlit/
│   └── config.toml         # Dark theme + server config for Streamlit Cloud
├── .env.example            # Template -- copy to .env and fill in keys
├── .gitignore              # Excludes .env, .venv, temp.pdf, __pycache__
├── requirements.txt        # Minimal direct dependencies (7 packages)
└── README.md               # This file

Local Setup

1. Clone the repo

git clone https://github.com/RohitDSonawane/PDFChatBot.git
cd PDFChatBot

2. Create and activate a virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Set up environment variables

cp .env.example .env

Then edit .env and add your keys:

PINECONE_API_KEY=your-pinecone-api-key
OPENROUTER_API_KEY=your-openrouter-api-key

Get Pinecone key: app.pinecone.io -> API Keys
Get OpenRouter key (free): openrouter.ai -> Keys

5. Run the app

streamlit run app.py

Deploy on Streamlit Community Cloud (Free)

Push this repo to GitHub (public)
Go to share.streamlit.io -> sign in with GitHub
Click New app -> select this repo -> set app.py as entry point
Go to Advanced settings -> Secrets and add:

PINECONE_API_KEY = "your-pinecone-api-key"
OPENROUTER_API_KEY = "your-openrouter-api-key"

Click Deploy -- live in ~2 minutes

Author

Rohit Sonawane

Star this repo if you found it useful! Feedback, issues, and PRs are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF ChatBot - Chat with Any PDF using AI

Features

Architecture

Tech Stack

Project Structure

Local Setup

1. Clone the repo

2. Create and activate a virtual environment

3. Install dependencies

4. Set up environment variables

5. Run the app

Deploy on Streamlit Community Cloud (Free)

Author

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

PDF ChatBot - Chat with Any PDF using AI

Features

Architecture

Tech Stack

Project Structure

Local Setup

1. Clone the repo

2. Create and activate a virtual environment

3. Install dependencies

4. Set up environment variables

5. Run the app

Deploy on Streamlit Community Cloud (Free)

Author