RAG Tool - Retrieval-Augmented Generation

An educational RAG (Retrieval-Augmented Generation) system with a FastAPI backend, React frontend, Qdrant vector database, and Ollama for inference.

Features

Document Management: Upload and process PDF, DOCX, TXT, MD, HTML, and XML files
Vector Search: Semantic search using Qdrant and SentenceTransformers
RAG Query: Answer questions based on document content
Streaming Responses: Real-time token streaming using Server-Sent Events
Chat History: Persistent chat sessions with conversation context
Multiple Interfaces: Web UI with specialized settings
Flexible Deployment: Docker or native installation

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Frontend  │────▶│   Backend    │────▶│   Qdrant    │
│  (React)    │     │  (FastAPI)   │     │  (Vectors)  │
│  Port 3000  │     │  Port 8000   │     │  Port 6333  │
└─────────────┘     └──────┬───────┘     └─────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │    Ollama    │
                    │   (Qwen)     │
                    │  Port 11434  │
                    └──────────────┘

Quick Start

Prerequisites

Ollama (required): Install from https://ollama.com

# Linux / WSL
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

Docker (required): For Qdrant vector database and containerized deployment
- Windows: Docker Desktop with WSL2
- macOS: Docker Desktop
- Linux: Docker Engine
Python 3.10+ (for native installation)
Node.js 18+ (for native installation)
uv (for native installation): curl -LsSf https://astral.sh/uv/install.sh | sh
tmux (optional, for automated setup): sudo apt install tmux or brew install tmux
16GB+ RAM recommended

Windows Users: Run all commands in WSL/Ubuntu terminal, not PowerShell or CMD.

Apple Silicon (M1/M2/M3) Users: Docker images are built for both linux/amd64 and linux/arm64. For optimal performance, ensure Docker Desktop is configured to use Apple Virtualization framework.

Installation

Method 1: Docker (Recommended)

Pull pre-built images from GitHub Container Registry:

# Start Ollama on host
ollama serve &
ollama pull qwen2.5:7b-instruct

# Clone repository
git clone https://github.com/aihpi/workshop-ragV2.git
cd workshop-ragV2

# Start with Docker Compose
docker-compose pill && docker-compose up -d

Visit http://localhost:3000

Note: The backend container connects to Ollama running on your host machine. Ollama must be running before starting the containers.

If your backend image defaults to openai, set these in docker-compose.yml:
environment:
  - LLM_PROVIDER=ollama
  - OLLAMA_HOST=host.docker.internal
  - OLLAMA_PORT=11434
  - OLLAMA_MODEL=qwen2.5:7b-instruct

Method 2: Automated Setup (Native)

Requires: Python 3.10+, Node.js 18+, uv, tmux, Docker

# Clone repository
git clone https://github.com/aihpi/workshop-ragV2.git
cd workshop-ragV2

# Run setup script
./scripts/setup_all.sh

# Start backend services (Qdrant, Ollama, Backend API)
./scripts/start_all.sh

# In a new terminal: start frontend
cd frontend
npm install
npm run dev

Visit http://localhost:3000

Method 3: Manual Setup (Native)

1. Clone and Backend Setup

git clone https://github.com/aihpi/workshop-ragV2.git
cd workshop-ragV2

2. Backend Setup

cd backend
./setup.sh
source .venv/bin/activate

3. Install Ollama and Download Model

# Install Ollama (Linux/WSL)
curl -fsSL https://ollama.com/install.sh | sh

# macOS
# brew install ollama

# Pull the default model
ollama pull qwen2.5:7b-instruct

4. Start Services

Terminal 1 - Qdrant:

./scripts/start_qdrant.sh

Terminal 2 - Ollama:

ollama serve

Terminal 3 - Backend:

cd backend
source .venv/bin/activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

5. Frontend Setup

cd frontend
npm install
npm run dev

Visit http://localhost:3000

Usage

Document Upload

Navigate to Upload Documents tab
Select files (PDF, DOCX, TXT, MD, HTML, XML)
Click Upload
Documents are automatically chunked and embedded

Querying

Navigate to Query Documents tab
Enter your question
Adjust parameters (temperature, top-k, etc.)
View streaming response and retrieved sources

Chat Mode

Navigate to Chat History tab
Create new chat session
Ask questions with conversation context
View and manage chat history

Configuration

Backend Configuration

Edit backend/.env:

# Ollama Settings
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
OLLAMA_MODEL=qwen2.5:7b-instruct
LLM_TEMPERATURE=0.7
LLM_MAX_TOKENS=512

# Document Processing
CHUNK_SIZE=512
CHUNK_OVERLAP=128

# Qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333

Frontend Configuration

For local development (without Docker), create a .env file:

cd frontend
cp .env.example .env

Edit frontend/.env:

VITE_API_URL=http://localhost:8000

Note: For Docker deployment, .env is not needed. The frontend uses relative URLs and nginx proxies /api requests to the backend container.

API Endpoints

Documents

POST /api/v1/documents/upload - Upload document
GET /api/v1/documents/list - List all documents
DELETE /api/v1/documents/{id} - Delete document
POST /api/v1/documents/sync - Sync from data folder

Query

POST /api/v1/query/query - Non-streaming query
POST /api/v1/query/query/stream - Streaming query (SSE)

Chat

POST /api/v1/chat/new - Create session
GET /api/v1/chat/list - List sessions
GET /api/v1/chat/{id} - Get history
DELETE /api/v1/chat/{id} - Delete session

Project Structure

workshop-rag/
├── backend/
│   ├── app/
│   │   ├── api/          # API routes
│   │   ├── core/         # Configuration
│   │   ├── models/       # Data models
│   │   ├── schemas/      # Pydantic schemas
│   │   ├── services/     # Business logic
│   │   └── main.py       # FastAPI app
│   ├── pyproject.toml
│   └── setup.sh
├── frontend/
│   ├── src/
│   │   ├── components/   # React components
│   │   ├── services/     # API client
│   │   └── App.tsx
│   └── package.json
├── data/                 # Document storage
├── chat_history/         # Chat sessions
├── qdrant_storage/       # Vector DB
├── models/              # Downloaded models
└── scripts/             # Setup scripts

Development

Backend Development

cd backend
source .venv/bin/activate

# Run with auto-reload
uvicorn app.main:app --reload

# Run tests
pytest

# Format code
black app/
isort app/

Frontend Development

cd frontend

# Development server
npm run dev

# Build for production
npm run build

# Type checking
npm run type-check

Troubleshooting

Ollama Connection Issues (Docker)

If the backend container cannot connect to Ollama running on your host:

1. Ensure Ollama listens on all interfaces:

# Stop Ollama if running
killall ollama

# Start with OLLAMA_HOST set to bind to all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve

2. Verify connectivity from container:

# Test connection from inside the backend container
docker exec -it workshop-ragv2-backend-1 curl http://host.docker.internal:11434/api/tags

3. Check firewall settings:

macOS: System Settings → Network → Firewall → Allow Ollama
Linux: sudo ufw allow 11434/tcp

Alternative: Docker Model Runner (macOS)

If you're on macOS with Docker Desktop, you can use Docker Model Runner instead of Ollama. It integrates directly with Docker Desktop and has GPU access on Apple Silicon.

1. Enable Docker Model Runner:

In Docker Desktop: Settings → Features in development → Enable "Docker Model Runner"

2. Pull a model:

docker model pull ai/qwen2.5:7B-Q4_K_M

3. Update docker-compose.yml environment:

environment:
  - OLLAMA_HOST=model-runner.docker.internal
  - OLLAMA_PORT=80

Note: Docker Model Runner uses an OpenAI-compatible API. Basic inference should work, but some Ollama-specific features may differ.

Backend won't start

Check if Qdrant is running: curl http://localhost:6333
Check if Ollama is running: curl http://localhost:11434/api/tags
Verify .env configuration

Model download fails

Check disk space
Verify internet connection
Try: ollama pull qwen2.5:7b-instruct

Out of memory

Use a smaller model: ollama pull qwen2.5:3b-instruct
Reduce LLM_MAX_TOKENS in configuration
Close other memory-intensive applications

Slow inference

Ensure Ollama is using GPU (check with ollama ps)
Reduce LLM_MAX_TOKENS
Use a smaller/faster model

Technical Details

Embedding Model: all-MiniLM-L6-v2 (384 dimensions)
LLM: Qwen 2.5 7B Instruct (via Ollama)
Chunking: 512 tokens with 128 token overlap
Vector Distance: Cosine similarity
LLM Backend: Ollama (port 11434)

Docker Images

Pre-built multi-architecture images (linux/amd64 and linux/arm64) are available on GitHub Container Registry:

# Pull images
docker pull ghcr.io/aihpi/workshop-ragv2-backend:latest
docker pull ghcr.io/aihpi/workshop-ragv2-frontend:latest

# Or use docker-compose (automatically pulls)
docker-compose pull
docker-compose up -d

Building Locally (Development)

For local development, use the dev compose override:

# Build and start containers locally
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up --build

Note: No .env file is required for Docker builds. The frontend uses relative URLs and nginx handles API proxying to the backend container.

License

See LICENSE file for details.

Contributing

Fork the repository
Create feature branch
Commit changes
Push to branch
Open pull request

Support

For issues and questions, please open a GitHub issue.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
00_aisc/img		00_aisc/img
backend		backend
frontend		frontend
img		img
notebooks		notebooks
scripts		scripts
slides		slides
.dockerignore		.dockerignore
.gitignore		.gitignore
.llm_backend		.llm_backend
.ollama_model		.ollama_model
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
IMPLEMENTATION.md		IMPLEMENTATION.md
LICENSE		LICENSE
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
jobs.db		jobs.db

Folders and files

Latest commit

History

Repository files navigation

RAG Tool - Retrieval-Augmented Generation

Features

Architecture

Quick Start

Prerequisites

Installation

Method 1: Docker (Recommended)

Method 2: Automated Setup (Native)

Method 3: Manual Setup (Native)

Usage

Document Upload

Querying

Chat Mode

Configuration

Backend Configuration

Frontend Configuration

API Endpoints

Documents

Query

Chat

Project Structure

Development

Backend Development

Frontend Development

Troubleshooting

Ollama Connection Issues (Docker)

Alternative: Docker Model Runner (macOS)

Backend won't start

Model download fails

Out of memory

Slow inference

Technical Details

Docker Images

Building Locally (Development)

License

Contributing

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages