Try the application live: https://bbkhosseini--two-stage-conrag-run.modal.run/
Experience the Two-Stage Consecutive RAG system in action! Upload your PDFs and ask questions directly in your browser.
The Two-Stage Consecutive RAG pipeline optimizes both precision and scalability by employing a sequential retrieval strategy that leverages the strengths of both keyword-based and semantic search while minimizing computational overhead. The user can upload PDF documents and interactively ask questions about their content. By leveraging a two-staged retrieval approach, the system processes documents, retrieves relevant information, and provides precise responses based on the uploaded content.
For a detailed explanation of the system, its design, and performance evaluation, check out the complete blog post: Two-Stage Consecutive RAG for Document QA.
- Document Loading and Chunking: Users upload PDFs, which are split into both small and large text chunks. Small chunks capture specific information and keyword matches, while large chunks provide broader context.
- Vector Store Creation: Large text chunks are embedded using a sentence transformer model and indexed in a vector database (ChromaDB) for efficient semantic search.
- Question Shortening: User query is condensed into essential keywords using an LLM.
- BM25 Keyword Search: A keyword search is performed using the BM25 algorithm on small chunks and the condensed keywords. A Cross-Encoder is used to rerank retrieved chunks based on semantic similarity.
- DRS Calculation: Aggregates small chunk scores to calculate a document retrieval score (DRS) and select top relevant documents.
- Semantic Search: Performs semantic search on large chunks within selected documents, and a Cross-Encoder Reranking further refines the relevance of retrieved chunks.
- Context Aggregation: Aggregates and ranks both small and large chunks based on their scores to form the final context.
- Answer Generation: The system generates a response based on the construcred context and the input query.
For details on backend components and architecture, see backend/README.md.
two-stage-conrag/
βββ backend/ # Core logic: PDF manager, retrievers, QA chains, settings
β βββ my_lib/ # Modular pipeline components
β βββ settings.py
β βββ utils.py
β βββ README.md
βββ frontend/ # Streamlit interface
β βββ app.py
β |ββ helper_gui.py
β βββ static/ # Static assets
βββ vector_store/ # Embedding DB client and index config
βββ configs/ # YAML configuration files
β βββ config.yaml
βββ data/ # Sample and full-scale PDF sets
β βββ sample_pdfs/
β βββ uploads/ # Temporary uploaded files
βββ notebooks/ # Prototyping and experimentation
βββ .env_example # Template for secrets and API keys
βββ Dockerfile # Production-ready Dockerfile (Poetry-free runtime)
βββ Makefile # CLI shortcuts for dev/test/deploy
βββ requirements.txt # streamlit deployment requirements
βββ requirements-local.txt # local implementation requirements
βββ requirements-fallback.txt # fallback for environments without Poetry
βββ pyproject.toml # Poetry project definition
βββ poetry.lock # Locked dependencies
βββ pytest.ini # Test configuration
βββ README.md # Project overview and instructions
git clone https://github.com/yourusername/two-stage-conrag.git
cd two-stage-conragEnsure Python 3.12.0 is installed. If needed, use pyenv:
pyenv install 3.12.0
pyenv local 3.12.0Copy the template file and set your API keys:
cp .env_example .envThen edit .env and add:
OPENAI_API_KEY=your-key-here
(Optional) LANGCHAIN_API_KEY=your-langsmith-key
DEPLOYMENT_MODE=local # or 'cloud'
DEBUG_MODE=false
IN_MEMORY=true # true for in-memory vector store, false for persistentπ¦ Poetry must be installed on your system. Install it via official guide or with:
curl -sSL https://install.python-poetry.org | python3 -
Once Poetry is available:
make install # Production dependencies
make install-dev # Development dependencies
make install-cloud # Cloud deployment setup
This installs dependencies into an isolated virtual environment based on pyproject.toml.
To activate the environment manually:
poetry shell
If you donβt have Poetry or need a quick pip install:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtThe
requirements.txtis generated frompoetry.lockusing:make export-reqs
Build and run the app inside a Docker container:
# Build image
make docker-build
# Run Streamlit app
make docker-runThis method skips Poetry and uses pip internally with a pinned requirements.txt exported from Poetry.
Note: The current Docker setup does not support the local model (llama-cpp-python) due to build constraints. If you require this functionality, please run the application in a local environment as described in the sections above.
Once your environment is ready:
make run # Launch the Streamlit appmake docker-runThen visit http://localhost:8501 in your browser to use the dashboard.
- Upload PDFs: Place your documents in a folder (e.g.,
data/pdfs_files/) and provide the path when prompted. Click "Submit PDFs" to ingest them. - Ask Questions: Once the PDFs are processed, type your question in the question box. The system will return an answer based on the ingested content.
The repository includes a sample PDF dataset located in the data/sample_pdfs/
folder. This dataset contains 15 PDF files that can be used for testing.
Note: These sample PDF files are sourced from Morningstar website, containing market predictions and reviews. They are included solely for demonstration and testing purposes.
For a more extensive test, a full-scale PDF dataset (approximately 150 MB) is available. You can download it from this Google Drive link.
- Agentic Pipelines: Introduce agent-based mechanisms to dynamically adjust retrieval strategies based on query complexity.
- Advanced Refinement Loops: Utilize techniques like retrieval grading and self-RAG to iteratively improve the quality of the final answer.
- Advanced Context Fusion: Implement sophisticated methods to combine retrieved information chunks more effectively.
- Self-RAG Mechanisms: Enable the system to self-improve by generating new retrieval queries based on past performance.
- Extensive Metadata: Enrich documents with additional metadata to improve retrieval precision.
- Hierarchical Structure: Incorporate hierarchical layers of information within the corpus.
- Domain-Specific Optimizations: Customize chunk sizes and retrieval models for specific industries or document types.
- Advanced Parsing: Enhance document processing to handle complex structures like tables and images.
The Two-Stage Consecutive RAG system delivers a scalable and precise solution for document-based question answering by combining keyword and semantic retrieval in a sequential pipeline. This hybrid approach ensures accurate and context-aware answers, even when working with large-scale, complex document collections.
This project is licensed under the MIT License. See the LICENSE file for details.
Contributions, suggestions, or feature requests are welcome!
If you'd like to contribute:
- Fork the repository
- Create a new branch (
git checkout -b feature/your-feature-name) - Commit your changes and open a pull request
Please ensure any new code is well-documented and tested.
For questions, feedback, or collaboration opportunities, feel free to reach out:
- π§ Email: bbkhosseini@gmail.com
- π LinkedIn: https://www.linkedin.com/in/bhosseini/
- π§βπ» GitHub: https://github.com/bab-git

