Multimodal RAG System

100% local AI knowledge base with image recognition and citation support. No cloud, no external APIs — fully GDPR-compliant.

What This Does

This system lets you ask natural-language questions about your internal documents (PDF, DOCX) and get answers with exact source citations and embedded images — all running locally.

Document Ingestion

Documents are automatically processed through a pipeline:

n8n detects new/changed files and triggers ingestion
Docling analyzes documents — extracts text, images, and runs OCR
Backend (FastAPI) chunks text with HybridChunker and creates embeddings (multilingual-e5-large-instruct, 1024 dim) plus BM25 sparse vectors
Qdrant stores the vectors with metadata (source, page, image references)
PostgreSQL stores extracted images and file hashes for idempotent re-ingestion

Question & Answer

User asks a question via Open WebUI
n8n receives the query and routes it to a Qwen3-4B query router
If RAG is needed: the Backend performs hybrid search (dense + sparse) against Qdrant, reranks results, resolves image placeholders, and assembles context
Qwen3-4B Instruct (Q4) generates an answer with source citations and embedded images
If RAG is not needed: the query goes directly to Ollama (no retrieval)

Components

Service	Description	Port
n8n	Workflow automation — file detection & query routing	5678
Backend	FastAPI + Docling — chunking, embedding, hybrid search	5008
Qdrant	Vector database (dense + sparse vectors + metadata)	6333
PostgreSQL	Image storage, file hashes, chat history	5432 (internal), 5436 (host)
Ollama	Local LLM inference (Qwen3-4B Instruct)	11434
Open WebUI	Chat interface	3000

Setup

Prerequisites

Docker and Docker Compose
GPU with at least 16 GB VRAM recommended (e.g. NVIDIA RTX 4080)
32 GB+ system RAM

1. Clone and configure

git clone <repo-url> && cd multimodal-rag
cp .env.example .env
# Edit .env to adjust passwords, ports, or model choices

2. Choose your Docker Compose mode

Production (everything in Docker)

All services including Ollama and the Backend run inside Docker containers. This is the default.

docker compose up -d

Use docker-compose.yml. All containers communicate over the internal rag-network — no host.docker.internal needed.

Development (Backend + Ollama running locally)

Use this when you want to run the Backend and Ollama natively on your host (e.g. on macOS to access the GPU directly):

docker compose -f docker-compose.dev.yml up -d

This starts only n8n, PostgreSQL, Qdrant, and Open WebUI in Docker. You then run Ollama and the Backend yourself:

# Start Ollama natively
ollama serve

# Start the Backend
cd backend
pip install -e .
uvicorn src.main:app --host 0.0.0.0 --port 5008

In dev mode, update your .env to point containers at the host:

OLLAMA_BASE_URL_INTERNAL=http://host.docker.internal:11434
OLLAMA_BASE_URL_WEBUI=http://host.docker.internal:11434

3. Run the setup script

chmod +x setup.sh
./setup.sh

This will:

Start Docker services
Wait for all services to become healthy
Pull the configured Ollama models
Create the Qdrant collection
Pre-download the HuggingFace tokenizer

4. Import the n8n workflow

Open n8n at http://localhost:5678
Go to Workflows → Import
Import Multimodal RAG.json
Configure credentials for Ollama, Qdrant, and PostgreSQL

5. Add documents

Place your PDF/DOCX files into the documents/ folder and activate the ingestion workflow in n8n.

Web UIs

UI	URL	Purpose
n8n	http://localhost:5678	Workflow editor and monitoring
Open WebUI	http://localhost:3000	Chat interface for asking questions
Qdrant Dashboard	http://localhost:6333/dashboard	Inspect vector collections and data

Hardware Requirements

GPU: 16 GB+ VRAM (NVIDIA RTX 4080 or similar)
RAM: 32 GB+
OS: Linux, macOS, or Windows

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
backend		backend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
Multimodal RAG.json		Multimodal RAG.json
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
init-db.sql		init-db.sql
multimodal-rag-architektur.excalidraw		multimodal-rag-architektur.excalidraw
opencode.json		opencode.json
opencode.lmstudio.json		opencode.lmstudio.json
setup.sh		setup.sh
table-rag-concept.excalidraw		table-rag-concept.excalidraw

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal RAG System

What This Does

Document Ingestion

Question & Answer

Components

Setup

Prerequisites

1. Clone and configure

2. Choose your Docker Compose mode

Production (everything in Docker)

Development (Backend + Ollama running locally)

3. Run the setup script

4. Import the n8n workflow

5. Add documents

Web UIs

Hardware Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal RAG System

What This Does

Document Ingestion

Question & Answer

Components

Setup

Prerequisites

1. Clone and configure

2. Choose your Docker Compose mode

Production (everything in Docker)

Development (Backend + Ollama running locally)

3. Run the setup script

4. Import the n8n workflow

5. Add documents

Web UIs

Hardware Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages