Skip to content

RohitDSonawane/PDFChatBot

Repository files navigation

PDF ChatBot - Chat with Any PDF using AI

Python Streamlit Llama Pinecone License

An AI-powered PDF ChatBot with a dark chat UI. Upload any PDF, ask questions in a conversational interface, and get grounded answers powered by Llama 3.3 70B via OpenRouter.

Built as a portfolio project - fast, lightweight, zero local GPU required.


Features

  • Upload & parse any PDF with PyMuPDF
  • Smart text chunking via LangChain
  • Semantic embeddings using all-MiniLM-L6-v2
  • Vector storage & retrieval with Pinecone (serverless)
  • Answers from Llama 3.3 70B via OpenRouter (free tier)
  • Persistent chat history with full conversation UI
  • Dark GitHub-inspired theme
  • Auto-retry on rate limits, batch upsert for large PDFs

Architecture

PDF Upload
    |
    v
PyMuPDF -- extract raw text
    |
    v
LangChain RecursiveCharacterTextSplitter -- chunk (500 tokens, 50 overlap)
    |
    v
SentenceTransformers all-MiniLM-L6-v2 -- embed chunks -> 384-dim vectors
    |
    v
Pinecone Serverless -- store & query vectors (top-5 retrieval)
    |
    v
OpenRouter -> Llama 3.3 70B Instruct -- generate grounded answer
    |
    v
Streamlit Dark Chat UI -- display in conversation thread

Tech Stack

Layer Technology
Frontend Streamlit with custom dark CSS
PDF Parsing PyMuPDF
Text Splitting LangChain RecursiveCharacterTextSplitter
Embeddings SentenceTransformers all-MiniLM-L6-v2
Vector DB Pinecone Serverless (free tier)
LLM Llama 3.3 70B via OpenRouter (free)
API Client OpenAI Python SDK

Project Structure

app/
├── app.py                  # Streamlit UI entry point
├── config.py               # All constants (model names, chunk sizes, etc.)
├── pdf_utils.py            # PDF extraction and text chunking
├── embedder.py             # SentenceTransformer loading and vector encoding
├── vector_store.py         # Pinecone connect, upsert, and query
├── llm.py                  # OpenRouter API call with retry logic
├── .streamlit/
│   └── config.toml         # Dark theme + server config for Streamlit Cloud
├── .env.example            # Template -- copy to .env and fill in keys
├── .gitignore              # Excludes .env, .venv, temp.pdf, __pycache__
├── requirements.txt        # Minimal direct dependencies (7 packages)
└── README.md               # This file

Local Setup

1. Clone the repo

git clone https://github.com/RohitDSonawane/PDFChatBot.git
cd PDFChatBot

2. Create and activate a virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Set up environment variables

cp .env.example .env

Then edit .env and add your keys:

PINECONE_API_KEY=your-pinecone-api-key
OPENROUTER_API_KEY=your-openrouter-api-key

5. Run the app

streamlit run app.py

Deploy on Streamlit Community Cloud (Free)

  1. Push this repo to GitHub (public)
  2. Go to share.streamlit.io -> sign in with GitHub
  3. Click New app -> select this repo -> set app.py as entry point
  4. Go to Advanced settings -> Secrets and add:
PINECONE_API_KEY = "your-pinecone-api-key"
OPENROUTER_API_KEY = "your-openrouter-api-key"
  1. Click Deploy -- live in ~2 minutes

Author

Rohit Sonawane


Star this repo if you found it useful! Feedback, issues, and PRs are welcome.

About

AI-powered PDF ChatBot — upload any PDF and chat with it using Llama 3.3 70B via OpenRouter. Built with Streamlit, Pinecone vector DB, and SentenceTransformers. Dark chat UI, modular architecture, free to deploy on Streamlit Community Cloud.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages