NCERT RAG-based QA System

A professional-grade Retrieval-Augmented Generation (RAG) system designed for intelligent question-answering over NCERT textbooks of Class 11 and 12. This project integrates LangChain, Qdrant vector store, Dockerized LLM backend using Ollama, and PDF data pipelines to deliver accurate and contextual answers to academic questions.

Objective

To build a localized and private QA system using open-source LLMs and RAG methodology, suitable for educational and contextual querying of NCERT content.

Project Structure

NCERT_RAG/
├── NCERT/                            # Folder containing subject-wise NCERT PDFs (Class 11 & 12)
│   ├── Class11_Chemistry/
│   ├── Class12_Physics/
│   └── ...
├── data_ingestion.py                # Main driver to ingest PDFs into vector DB
├── data_loader.py                   # Loads and parses PDFs
├── ncert_loader.py                  # Handles recursive loading from subject folders
├── ingestion_to_qdrant.py          # Handles document embedding and ingestion to Qdrant
├── retriever.py                     # Query interface, fetches and answers from vector store
├── requirements.txt                 # Python dependencies
└── README.md                        # Project documentation

🛠️ Tools & Libraries Used

Tool/Library	Purpose
LangChain	Framework for building LLM-powered applications
Qdrant	Vector database for storing embedded document vectors
Ollama	Runs open-source LLMs like LLaMA3 locally via Docker
Docker	Containerized deployment of LLMs (backend requirement for Ollama)
PyMuPDF / fitz	Parses and reads text content from PDF files

Flowchart

graph TD
    A[Start] --> B[Start Docker + Ollama]
    B --> C[Run data_ingestion.py]
    C --> D[Parse PDFs using ncert_loader.py and data_loader.py]
    D --> E[Generate embeddings]
    E --> F[Store in Qdrant via ingestion_to_qdrant.py]
    F --> G[Run retriever.py]
    G --> H[Accept user query]
    H --> I[Retrieve top documents from Qdrant]
    I --> J[Answer using Ollama LLM (LangChain)]
    J --> K[Display final answer]

Pre-Requisites

1. Install Docker & Start Ollama

Ollama must be installed and running inside Docker to serve LLaMA3 model.

docker run -d -p 11434:11434 ollama/ollama:latest
ollama run llama3

2. Install Python Libraries

Create a virtual environment and install dependencies:

conda create -n ncert_rag python=3.10 -y
conda activate ncert_rag
pip install -r requirements.txt

requirements.txt:

langchain>=0.1.0
qdrant-client>=1.6.0
openai>=1.0.0
PyMuPDF
ollama

Running the Application

Step 1: Start Ollama (Must be running before everything else)

docker start <container_id>
ollama run llama3

Step 2: Ingest Data

python data_ingestion.py

This step recursively loads all PDFs from the NCERT/ directory, generates embeddings, and stores them in Qdrant.

Step 3: Query the Data

python retriever.py

Enter your query when prompted and get accurate responses using the local LLM.

Clean Output Tips

To avoid raw metadata and show only the final answer: In retriever.py, replace:

print("\n######### Answer #########:\n", answer)

with:

print("\n######### Answer #########:\n", answer['output_text'])

Sample Query

Question: Was India isolated from the world 2000 years ago?
Answer: According to the context, It is mentioned that "Globalization and Social Change" affected independent India. This implies that India was not isolated from the world, as global connections existed in the past.

📌 Notes

Ollama must be running in the background before you run ingestion or query steps.
All documents are loaded from the subject-wise NCERT/ folder. Make sure PDFs are properly placed.
Only Class 11 and 12 PDFs are supported by default. Modify ncert_loader.py to extend.

License

This project is for learning and educational use. Customize it for your use case! Licensed under the MIT License. See the LICENSE file for details.

For any feedback or contributions, please open an issue or submit a pull request!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NCERT RAG-based QA System

Objective

Project Structure

🛠️ Tools & Libraries Used

Flowchart

Pre-Requisites

1. Install Docker & Start Ollama

2. Install Python Libraries

Running the Application

Step 1: Start Ollama (Must be running before everything else)

Step 2: Ingest Data

Step 3: Query the Data

Clean Output Tips

Sample Query

📌 Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
data_ingestion.py		data_ingestion.py
data_loader.py		data_loader.py
ingestion_to_qdrant.py		ingestion_to_qdrant.py
ncert_loader.py		ncert_loader.py
requirements.txt		requirements.txt
retriever.py		retriever.py

Folders and files

Latest commit

History

Repository files navigation

NCERT RAG-based QA System

Objective

Project Structure

🛠️ Tools & Libraries Used

Flowchart

Pre-Requisites

1. Install Docker & Start Ollama

2. Install Python Libraries

Running the Application

Step 1: Start Ollama (Must be running before everything else)

Step 2: Ingest Data

Step 3: Query the Data

Clean Output Tips

Sample Query

📌 Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages