Semantic Search API

This repository contains a production-ready Semantic Search and Retrieval-Augmented Generation (RAG) system built using Python, FastAPI, Pinecone, and Llama 3 (served via Groq).

The system combines sparse keyword-based retrieval (BM25) with dense vector retrieval (Pinecone) to deliver accurate, low-latency, and scalable question answering.

The project is intentionally structured as a modular Python codebase, avoiding notebook-centric designs.

🚀 Key Features

Hybrid Retrieval
- Sparse retrieval using BM25 for exact keyword matching
- Dense semantic retrieval using sentence embeddings stored in Pinecone
- Weighted score fusion for improved recall and robustness
Production-Grade Architecture
- Modular Python files with clear separation of concerns
- Config-driven design with environment-based secrets
- Designed for extensibility (reranking, evaluation, APIs)
Low-Latency LLM Inference
- Llama 3 (8B / 70B) served via Groq
- Fast response times suitable for real-time applications
API-First Design
- FastAPI-based REST service
- Interactive Swagger documentation
- CLI and API interfaces supported

📁 Project Structure

semantic-search-api/
│
├── src/
│   ├── config.py        # Central configuration & environment loading
│   ├── api.py           # FastAPI application
│   ├── data.py          # Document loading and chunking
│   ├── retrieval.py     # BM25, dense, and hybrid retrievers
│   ├── generation.py    # Prompting and Llama 3 (Groq) integration
│   ├── pipeline.py      # End-to-end RAG orchestration
│   └── main.py          # CLI entry point
│
├── data/
│   └── documents/       # Input text documents
│
├── tools/
│   └── build_pinecone_index.py  # One-time vector index builder
│
├── requirements.txt
├── .env.example
├── .gitignore
└── README.md

🧠 System Overview

Documents are loaded and chunked into passages
BM25 retrieves keyword-relevant passages in-memory
Pinecone retrieves semantically relevant passages using embeddings
Retrieval scores are fused via a hybrid strategy
Top-ranked context is injected into a prompt
Llama 3 on Groq generates a grounded answer

This separation of retrieval, ranking, and generation makes the system easier to tune, debug, and scale.

⚙️ Setup

1. Clone the repository

git clone <your-repository-url>
cd semantic-search-api

2. Install dependencies

pip install -r requirements.txt

3. Configure environment variables

Create a .env file in the project root:

GROQ_API_KEY=your_groq_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here

⚠️ Do not commit .env to version control.

🗄️ Build the Pinecone Index (One-Time)

Before running the system, build the vector index:

python tools/build_pinecone_index.py

This step:

Embeds document chunks
Uploads vectors to Pinecone
Stores text as metadata for retrieval

▶️ Running the Application

Option 1: Run as an API (Recommended)

uvicorn src.api:app --reload

Swagger UI: http://127.0.0.1:8000/docs
Endpoint: POST /query

Example request:

{
  "query": "What is hybrid search?",
  "top_k": 5
}

Option 2: Run as a CLI

python src/main.py

Example:

Query: Explain hybrid retrieval
Answer: ...

🔮 Extensibility

The system is designed to be easily extended with:

Cross-encoder or LLM-based rerankers
Streaming responses
Authentication and rate limiting
Vector store alternatives (FAISS, Chroma)
Monitoring and feedback loops

🏆 Why This Project

This project demonstrates:

Practical understanding of information retrieval systems
Trade-offs between sparse and dense search
Real-world RAG system design beyond notebooks
Clean, production-style Python engineering
API-first ML system deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semantic Search API

🚀 Key Features

📁 Project Structure

🧠 System Overview

⚙️ Setup

1. Clone the repository

2. Install dependencies

3. Configure environment variables

🗄️ Build the Pinecone Index (One-Time)

▶️ Running the Application

Option 1: Run as an API (Recommended)

Option 2: Run as a CLI

🔮 Extensibility

🏆 Why This Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
tools		tools
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Semantic Search API

🚀 Key Features

📁 Project Structure

🧠 System Overview

⚙️ Setup

1. Clone the repository

2. Install dependencies

3. Configure environment variables

🗄️ Build the Pinecone Index (One-Time)

▶️ Running the Application

Option 1: Run as an API (Recommended)

Option 2: Run as a CLI

🔮 Extensibility

🏆 Why This Project

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages