A production-ready RAG (Retrieval-Augmented Generation) system with automated evaluation using TruLens. This system enables you to ingest documents, query them with semantic search, and evaluate the quality of responses using heuristic-based metrics.
- RAG Framework: LlamaIndex
- Vector Database: Milvus (HNSW index)
- LLM: Ollama (Llama 3 8B)
- Embedding Model: nomic-embed-text (Ollama)
- Evaluation: SimulatedEvaluator (heuristic-based metrics)
- API: FastAPI
- Deployment: Docker Compose
- Multi-format Document Ingestion: Supports PDF, Word, Excel, PowerPoint, HTML, CSV, and more via MarkItDown
- Semantic Search: Vector-based retrieval with configurable top_k
- Automated Evaluation: Faithfulness, Relevance, Context Precision, Context Recall metrics
- Docker-based Deployment: All services orchestrated with Docker Compose
- RESTful API: Simple
/ingestand/queryendpoints - Full Test Coverage: >80% test coverage with pytest
-
Clone and start services:
docker-compose up -d
-
Wait for services to be ready (approximately 20 seconds):
sleep 20
-
Download Ollama models:
docker-compose exec ollama ollama pull llama3:8b nomic-embed-text -
Ingest a document:
curl -X POST http://localhost:8000/ingest \ -F "file=@sample.pdf" -
Query the system:
curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -d '{"query": "What is RAG?"}'
-
View the TruLens dashboard:
http://localhost:8501
Get started quickly by cloning the repository from GitHub:
# Clone the repository
git clone https://github.com/zabwie/Closed-Loop-RAG-System
cd rag-system
# Start all services
docker-compose up -d
# Wait for services to initialize (approximately 20 seconds)
sleep 20
# Download required Ollama models
docker-compose exec ollama ollama pull llama3:8b nomic-embed-text
# Verify services are running
docker-compose psOnce services are running, you can ingest documents and query the system using the API endpoints documented below.
- Docker and Docker Compose installed
- Python 3.11+ (for local development)
- 8GB+ VRAM (for running Llama 3 8B locally)
- 10GB+ disk space (for Docker images and data)
-
Clone the repository:
# Clone via HTTPS git clone https://github.com/zabwie/Closed-Loop-RAG-System # Or clone via SSH git clone git@github.com:zabwie/Closed-Loop-RAG-System # Navigate into the project directory cd rag-system
-
Configure environment variables (optional):
cp .env.example .env # Edit .env with your custom settings -
Start all services:
docker-compose up -d
-
Verify services are running:
docker-compose ps
-
Download Ollama models:
docker-compose exec ollama ollama pull llama3:8b nomic-embed-text
-
Install Python dependencies:
pip install -r requirements.txt pip install -r requirements-dev.txt
-
Set PYTHONPATH:
export PYTHONPATH=$(pwd)/src # Linux/Mac set PYTHONPATH=Z:\Gemini\RAG\src # Windows
-
Run tests:
pytest tests/ -v --cov=src/rag_system
┌─────────────────────────────────────────────────────────────┐
│ FastAPI │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ /ingest │ │ /query │ │
│ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────────────────┼───────────────────┘
│ │
▼ ▼
┌──────────────────────┐ ┌──────────────────────────────┐
│ Ingestion Pipeline │ │ RAG Pipeline │
│ ┌────────────────┐ │ │ ┌────────────────────────┐ │
│ │ MarkItDown │ │ │ │ EmbeddingService │ │
│ │ Converter │ │ │ │ (nomic-embed-text) │ │
│ └────────┬───────┘ │ │ └───────────┬────────────┘ │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ ┌────────────────┐ │ │ ┌────────────────────────┐ │
│ │ TextChunker │ │ │ │ MilvusVectorStore │ │
│ │ (512 tokens) │ │ │ │ (HNSW index) │ │
│ └────────┬───────┘ │ │ └───────────┬────────────┘ │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ ┌────────────────┐ │ │ ┌────────────────────────┐ │
│ │ Embedding │ │ │ │ OllamaClient │ │
│ │ Service │ │ │ │ (llama3:8b) │ │
│ └────────┬───────┘ │ │ └───────────┬────────────┘ │
│ │ │ │ │ │
│ ▼ │ │ ▼ │
│ ┌────────────────┐ │ │ ┌────────────────────────┐ │
│ │ Milvus │ │ │ │ SimulatedEvaluator │ │
│ │ VectorStore │ │ │ │ (RAG Triad Metrics) │ │
│ └────────────────┘ │ │ └────────────────────────┘ │
└──────────────────────┘ └──────────────────────────────┘
Ingestion Pipeline:
- Document upload → MarkItDown conversion (extracts text and metadata)
- Text chunking → Fixed-size chunks (512 tokens, 50 overlap)
- Embedding generation → nomic-embed-text model
- Milvus storage → Vector indexing with HNSW
Query Pipeline:
- Query embedding → nomic-embed-text model
- Similarity search → Milvus HNSW index (top_k results)
- Context assembly → Join retrieved documents
- LLM generation → llama3:8b with RAG prompt
- Response evaluation → SimulatedEvaluator metrics
| Service | Image | Ports | Purpose |
|---|---|---|---|
| etcd | quay.io/coreos/etcd:v3.5.5 | 2379 | Metadata storage for Milvus |
| minio | minio/minio:RELEASE.2023-03-20T20-16-18Z | 9000 | Object storage for Milvus |
| milvus-standalone | milvusdb/milvus:v2.3.3 | 19530, 9091 | Vector database |
| ollama | ollama/ollama:latest | 11434 | LLM and embedding service |
| trulens | ghcr.io/truera/trulens:latest | 8501 | Evaluation dashboard |
| rag-app | python:3.11-slim | 8000 | FastAPI application |
Upload and ingest a document into the RAG system.
Request:
POST /ingest
Content-Type: multipart/form-data
file: <binary file data>Response:
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"chunk_count": 42,
"source": "sample.pdf",
"error": null
}Supported Formats: PDF, Word (.docx), Excel (.xlsx), PowerPoint (.pptx), HTML, CSV, Markdown, and more.
Query the RAG system with evaluation metrics.
Request:
POST /query
Content-Type: application/json
{
"query": "What is RAG?",
"top_k": 5
}Response:
{
"query": "What is RAG?",
"answer": "RAG (Retrieval-Augmented Generation) is a technique that combines...",
"sources": [
{
"text": "RAG is a technique that enhances LLM responses...",
"score": 0.85,
"metadata": {
"source": "sample.pdf",
"chunk_index": 3
}
}
],
"retrieved_count": 5,
"evaluation": {
"faithfulness": 0.75,
"relevance": 0.80,
"context_precision": 0.70,
"context_recall": 0.60,
"overall_score": 0.72
}
}Parameters:
query(string, required): The question to asktop_k(integer, optional): Number of documents to retrieve (default: 5, range: 1-20)
Evaluation Metrics:
- Faithfulness: Word overlap between answer and sources (0-1)
- Relevance: Word overlap between query and answer (0-1)
- Context Precision: Average source score (0-1)
- Context Recall: Retrieved count / ideal count (0-1)
- Overall Score: Weighted average (faithfulness 30%, relevance 30%, precision 20%, recall 20%)
Health check endpoint.
Response:
{
"status": "healthy",
"services": {
"ollama": "connected",
"milvus": "connected"
}
}| Variable | Default | Description |
|---|---|---|
OLLAMA_URL |
http://localhost:11434 | Ollama API endpoint |
MILVUS_HOST |
localhost | Milvus host |
MILVUS_PORT |
19530 | Milvus port |
MODEL_NAME |
llama3:8b | LLM model name |
EMBEDDING_MODEL |
nomic-embed-text | Embedding model name |
COLLECTION_NAME |
documents | Milvus collection name |
CHUNK_SIZE |
512 | Chunk size in tokens |
CHUNK_OVERLAP |
50 | Chunk overlap in tokens |
TOP_K |
5 | Default number of results to retrieve |
- Index Type: HNSW (Hierarchical Navigable Small World)
- Metric Type: IP (Inner Product)
- Dimension: 768 (nomic-embed-text embedding dimension)
Problem: Docker services fail to start or crash immediately.
Solution:
# Check service status
docker-compose ps
# Check service logs
docker-compose logs [service-name]
# Restart services
docker-compose restart
# Rebuild and restart
docker-compose up -d --buildProblem: Error message "model 'llama3:8b' not found".
Solution:
# Pull required models
docker-compose exec ollama ollama pull llama3:8b nomic-embed-text
# Verify models are installed
docker-compose exec ollama ollama listProblem: Error connecting to Milvus database.
Solution:
# Check Milvus logs
docker-compose logs milvus-standalone
# Verify etcd and minio are running
docker-compose ps etcd minio
# Restart Milvus
docker-compose restart milvus-standalone
# Check Milvus health
docker-compose exec milvus-standalone curl http://localhost:9091/healthzProblem: Evaluation scores are all zero or missing.
Solution:
# Check TruLens logs
docker-compose logs trulens
# Verify database file exists
ls -la trulens.db
# Restart TruLens service
docker-compose restart trulensProblem: Ollama crashes with OOM (Out of Memory) errors.
Solution:
# Check available memory
docker stats
# Reduce Ollama memory limit in docker-compose.yml
# Add: deploy: resources: limits: memory: 8g
# Use a smaller model
docker-compose exec ollama ollama pull llama3:8b-instruct-q4_0Problem: Queries take more than 10 seconds to complete.
Solution:
# Check Milvus index status
docker-compose exec milvus-standalone python -c "from pymilvus import utility; print(utility.index_info('documents'))"
# Reduce top_k parameter in query
curl -X POST http://localhost:8000/query -H "Content-Type: application/json" -d '{"query": "...", "top_k": 3}'
# Increase Milvus resources in docker-compose.ymlProblem: Error "port is already allocated".
Solution:
# Find process using the port
netstat -ano | findstr :8000 # Windows
lsof -i :8000 # Linux/Mac
# Kill the process or change port in docker-compose.yml# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=src/rag_system
# Run specific test categories
pytest tests/unit/ -v -m unit
pytest tests/integration/ -v -m integration
pytest tests/e2e/ -v -m e2e
# Run specific test file
pytest tests/unit/ingestion/test_markitdown_converter.py -v# Format code with black
black src/ tests/
# Lint with ruff
ruff check src/ tests/
# Fix linting issues
ruff check --fix src/ tests/RAG/
├── src/
│ └── rag_system/
│ ├── ingestion/ # Document ingestion pipeline
│ │ ├── markitdown_converter.py
│ │ ├── chunker.py
│ │ ├── embeddings.py
│ │ └── ingester.py
│ ├── vector_store/ # Milvus integration
│ │ └── milvus_client.py
│ ├── generation/ # LLM and RAG engine
│ │ ├── ollama_client.py
│ │ └── rag_engine.py
│ ├── evaluation/ # Evaluation metrics
│ │ └── trulens_evaluator.py
│ ├── api/ # FastAPI endpoints
│ │ ├── main.py
│ │ └── models.py
│ └── utils/ # Utilities
│ └── config.py
├── tests/
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ └── e2e/ # End-to-end tests
├── docker-compose.yml # Docker orchestration
├── Dockerfile # FastAPI container
├── pyproject.toml # Project configuration
├── requirements.txt # Production dependencies
└── requirements-dev.txt # Development dependencies
MIT License - see LICENSE file for details.
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch
- Write tests for your changes
- Ensure all tests pass with >80% coverage
- Submit a pull request
- LlamaIndex: RAG framework
- Milvus: Vector database
- Ollama: LLM and embedding service
- TruLens: Evaluation framework
- MarkItDown: Document conversion library
Songs listened during the making: 11:11 (Roa), PPC (ROA, Hades66), MAMI 100PRE SABE (interlude) (Alvaro Diaz, Nsqk)