An AI-powered PDF Question Answering system built using Retrieval Augmented Generation (RAG), ChromaDB, Sentence Transformers, and Streamlit.
This application allows users to upload any PDF and ask questions based on its content.
- 📂 Upload any PDF file
- ✂️ Automatic text extraction from PDF
- 🧠 Semantic chunk embedding using Sentence Transformers
- 🗄 Vector storage using ChromaDB
- 🔎 Similarity-based retrieval
- 💬 Question answering from uploaded document
- 🌐 Streamlit web interface
- Python
- Streamlit
- ChromaDB
- Sentence Transformers
- PyPDF
RAG_Project │ ├── app.py # Streamlit UI ├── pdf_utils.py # PDF text extraction ├── main.py # Basic RAG script ├── requirements.txt ├── README.md ├── .gitignore └── data/
- Clone the repository
git clone https://github.com/Varshakaleeswaran/RAG_Project.git
cd RAG_Project
- Create virtual environment
python -m venv venv venv\Scripts\activate # Windows
- Install dependencies
pip install -r requirements.txt
streamlit run app.py
Open browser at:
- User uploads a PDF
- Text is extracted using PyPDF
- Text is split into chunks
- Each chunk is converted into embeddings
- Embeddings are stored in ChromaDB
- User asks a question
- Query is embedded and matched with similar chunks
- Most relevant content is returned as answer
- 📚 Academic Notes Assistant
- 📄 Research Paper Analyzer
- 🤖 Technical Documentation Assistant
- 📑 Legal Document Search
- 📊 Company Policy QA System
- Persistent ChromaDB storage
- OpenAI/GPT-based answer generation
- Chat history memory
- Multi-PDF support
- Deployment on Streamlit Cloud
Varsha Kaleeswaran
Give it a star on GitHub!
