An educational RAG (Retrieval-Augmented Generation) system with a FastAPI backend, React frontend, Qdrant vector database, and Ollama for inference.
- Document Management: Upload and process PDF, DOCX, TXT, MD, HTML, and XML files
- Vector Search: Semantic search using Qdrant and SentenceTransformers
- RAG Query: Answer questions based on document content
- Streaming Responses: Real-time token streaming using Server-Sent Events
- Chat History: Persistent chat sessions with conversation context
- Multiple Interfaces: Web UI with specialized settings
- Flexible Deployment: Docker or native installation
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Frontend │────▶│ Backend │────▶│ Qdrant │
│ (React) │ │ (FastAPI) │ │ (Vectors) │
│ Port 3000 │ │ Port 8000 │ │ Port 6333 │
└─────────────┘ └──────┬───────┘ └─────────────┘
│
▼
┌──────────────┐
│ Ollama │
│ (Qwen) │
│ Port 11434 │
└──────────────┘
- Ollama (required): Install from https://ollama.com
# Linux / WSL curl -fsSL https://ollama.com/install.sh | sh # macOS brew install ollama
- Docker (required): For Qdrant vector database and containerized deployment
- Windows: Docker Desktop with WSL2
- macOS: Docker Desktop
- Linux: Docker Engine
- Python 3.10+ (for native installation)
- Node.js 18+ (for native installation)
- uv (for native installation):
curl -LsSf https://astral.sh/uv/install.sh | sh - tmux (optional, for automated setup):
sudo apt install tmuxorbrew install tmux - 16GB+ RAM recommended
Windows Users: Run all commands in WSL/Ubuntu terminal, not PowerShell or CMD.
Apple Silicon (M1/M2/M3) Users: Docker images are built for both
linux/amd64andlinux/arm64. For optimal performance, ensure Docker Desktop is configured to use Apple Virtualization framework.
Pull pre-built images from GitHub Container Registry:
# Start Ollama on host
ollama serve &
ollama pull qwen2.5:7b-instruct
# Clone repository
git clone https://github.com/aihpi/workshop-ragV2.git
cd workshop-ragV2
# Start with Docker Compose
docker-compose pill && docker-compose up -dVisit http://localhost:3000
Note: The backend container connects to Ollama running on your host machine. Ollama must be running before starting the containers.
If your backend image defaults to
openai, set these indocker-compose.yml:environment: - LLM_PROVIDER=ollama - OLLAMA_HOST=host.docker.internal - OLLAMA_PORT=11434 - OLLAMA_MODEL=qwen2.5:7b-instruct
Requires: Python 3.10+, Node.js 18+, uv, tmux, Docker
# Clone repository
git clone https://github.com/aihpi/workshop-ragV2.git
cd workshop-ragV2
# Run setup script
./scripts/setup_all.sh
# Start backend services (Qdrant, Ollama, Backend API)
./scripts/start_all.sh
# In a new terminal: start frontend
cd frontend
npm install
npm run devVisit http://localhost:3000
1. Clone and Backend Setup
git clone https://github.com/aihpi/workshop-ragV2.git
cd workshop-ragV22. Backend Setup
cd backend
./setup.sh
source .venv/bin/activate3. Install Ollama and Download Model
# Install Ollama (Linux/WSL)
curl -fsSL https://ollama.com/install.sh | sh
# macOS
# brew install ollama
# Pull the default model
ollama pull qwen2.5:7b-instruct4. Start Services
Terminal 1 - Qdrant:
./scripts/start_qdrant.shTerminal 2 - Ollama:
ollama serveTerminal 3 - Backend:
cd backend
source .venv/bin/activate
uvicorn app.main:app --reload --host 0.0.0.0 --port 80005. Frontend Setup
cd frontend
npm install
npm run devVisit http://localhost:3000
- Navigate to Upload Documents tab
- Select files (PDF, DOCX, TXT, MD, HTML, XML)
- Click Upload
- Documents are automatically chunked and embedded
- Navigate to Query Documents tab
- Enter your question
- Adjust parameters (temperature, top-k, etc.)
- View streaming response and retrieved sources
- Navigate to Chat History tab
- Create new chat session
- Ask questions with conversation context
- View and manage chat history
Edit backend/.env:
# Ollama Settings
OLLAMA_HOST=localhost
OLLAMA_PORT=11434
OLLAMA_MODEL=qwen2.5:7b-instruct
LLM_TEMPERATURE=0.7
LLM_MAX_TOKENS=512
# Document Processing
CHUNK_SIZE=512
CHUNK_OVERLAP=128
# Qdrant
QDRANT_HOST=localhost
QDRANT_PORT=6333For local development (without Docker), create a .env file:
cd frontend
cp .env.example .envEdit frontend/.env:
VITE_API_URL=http://localhost:8000Note: For Docker deployment,
.envis not needed. The frontend uses relative URLs and nginx proxies/apirequests to the backend container.
POST /api/v1/documents/upload- Upload documentGET /api/v1/documents/list- List all documentsDELETE /api/v1/documents/{id}- Delete documentPOST /api/v1/documents/sync- Sync from data folder
POST /api/v1/query/query- Non-streaming queryPOST /api/v1/query/query/stream- Streaming query (SSE)
POST /api/v1/chat/new- Create sessionGET /api/v1/chat/list- List sessionsGET /api/v1/chat/{id}- Get historyDELETE /api/v1/chat/{id}- Delete session
workshop-rag/
├── backend/
│ ├── app/
│ │ ├── api/ # API routes
│ │ ├── core/ # Configuration
│ │ ├── models/ # Data models
│ │ ├── schemas/ # Pydantic schemas
│ │ ├── services/ # Business logic
│ │ └── main.py # FastAPI app
│ ├── pyproject.toml
│ └── setup.sh
├── frontend/
│ ├── src/
│ │ ├── components/ # React components
│ │ ├── services/ # API client
│ │ └── App.tsx
│ └── package.json
├── data/ # Document storage
├── chat_history/ # Chat sessions
├── qdrant_storage/ # Vector DB
├── models/ # Downloaded models
└── scripts/ # Setup scripts
cd backend
source .venv/bin/activate
# Run with auto-reload
uvicorn app.main:app --reload
# Run tests
pytest
# Format code
black app/
isort app/cd frontend
# Development server
npm run dev
# Build for production
npm run build
# Type checking
npm run type-checkIf the backend container cannot connect to Ollama running on your host:
1. Ensure Ollama listens on all interfaces:
# Stop Ollama if running
killall ollama
# Start with OLLAMA_HOST set to bind to all interfaces
OLLAMA_HOST=0.0.0.0 ollama serve2. Verify connectivity from container:
# Test connection from inside the backend container
docker exec -it workshop-ragv2-backend-1 curl http://host.docker.internal:11434/api/tags3. Check firewall settings:
- macOS: System Settings → Network → Firewall → Allow Ollama
- Linux:
sudo ufw allow 11434/tcp
If you're on macOS with Docker Desktop, you can use Docker Model Runner instead of Ollama. It integrates directly with Docker Desktop and has GPU access on Apple Silicon.
1. Enable Docker Model Runner:
In Docker Desktop: Settings → Features in development → Enable "Docker Model Runner"
2. Pull a model:
docker model pull ai/qwen2.5:7B-Q4_K_M3. Update docker-compose.yml environment:
environment:
- OLLAMA_HOST=model-runner.docker.internal
- OLLAMA_PORT=80Note: Docker Model Runner uses an OpenAI-compatible API. Basic inference should work, but some Ollama-specific features may differ.
- Check if Qdrant is running:
curl http://localhost:6333 - Check if Ollama is running:
curl http://localhost:11434/api/tags - Verify
.envconfiguration
- Check disk space
- Verify internet connection
- Try:
ollama pull qwen2.5:7b-instruct
- Use a smaller model:
ollama pull qwen2.5:3b-instruct - Reduce
LLM_MAX_TOKENSin configuration - Close other memory-intensive applications
- Ensure Ollama is using GPU (check with
ollama ps) - Reduce
LLM_MAX_TOKENS - Use a smaller/faster model
- Embedding Model: all-MiniLM-L6-v2 (384 dimensions)
- LLM: Qwen 2.5 7B Instruct (via Ollama)
- Chunking: 512 tokens with 128 token overlap
- Vector Distance: Cosine similarity
- LLM Backend: Ollama (port 11434)
Pre-built multi-architecture images (linux/amd64 and linux/arm64) are available on GitHub Container Registry:
# Pull images
docker pull ghcr.io/aihpi/workshop-ragv2-backend:latest
docker pull ghcr.io/aihpi/workshop-ragv2-frontend:latest
# Or use docker-compose (automatically pulls)
docker-compose pull
docker-compose up -dFor local development, use the dev compose override:
# Build and start containers locally
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up --buildNote: No
.envfile is required for Docker builds. The frontend uses relative URLs and nginx handles API proxying to the backend container.
See LICENSE file for details.
- Fork the repository
- Create feature branch
- Commit changes
- Push to branch
- Open pull request
For issues and questions, please open a GitHub issue.
