Chatbot-mso

Full-stack RAG chatbot for answering questions about administrative procedures (e.g. civil documents, ID cards, passports). It uses a FastAPI backend, a Vite + React frontend, Hugging Face (or OpenAI) for LLM calls, and a local vector DB for retrieval. Optional OCR (Tesseract) is available for image uploads.

Features

Multilingual Q&A — French and Arabic (and other languages the LLM supports)
Retrieval-augmented generation (RAG) — Local vector store + optional reranking
Dual LLM backends — Hugging Face Router (default) or OpenAI
Image OCR — Upload a document image; text is extracted with Tesseract and can be used in the conversation
Docker-based — Dev and production setups via Docker Compose

Project structure

Chatbot-mso/
├── backend/                 # FastAPI app (LLM, RAG, OCR)
│   ├── app/
│   │   ├── core/            # Prompts, config
│   │   └── services/        # LLM, OCR, vector retrieval
│   ├── main.py
│   ├── Dockerfile
│   └── Dockerfile.dev
├── frontend/                # Vite + React + Tailwind
│   ├── src/
│   ├── Dockerfile
│   └── Dockerfile.dev
├── data-pipeline/           # Builds vector DB from source data
├── data/
│   └── vectordb/            # Vector DB (created or downloaded)
├── deployment/
│   └── scripts/             # start.sh, stop.sh (production)
├── docker-compose-dev.yml   # Development (hot reload)
├── docker-compose-prod.yml  # Production
├── run.sh                   # Dev: build vectordb if needed, then docker-compose-dev up
└── .env / .env.example

Prerequisites

Docker and Docker Compose (v2)
(Optional) Prebuilt vector DB — speeds up first run; see Vector DB below

Quick start (development)

1. Clone and env

git clone <your-repo-url>
cd Chatbot-mso
cp .env.example .env

Edit .env and set at least:

Hugging Face (default): HF_TOKEN=your-hf-token
Or OpenAI: OPENAI_API_KEY=your-openai-key and switch backend in backend/main.py (see Choose LLM backend).

2. Vector DB

You need a populated data/vectordb directory. Either:

Download prebuilt (recommended):
Prebuilt vectordb (MediaFire)
Unzip and put the contents directly in data/vectordb.
Or run without it: run.sh will build the vector DB via the data-pipeline (slower, first time only).

3. Start dev stack

From the repo root:

./run.sh

This will:

Build the vector DB with the data-pipeline image if data/vectordb is missing or empty
Start backend and frontend with docker-compose-dev.yml (hot reload)

Ports:

Service	URL
Frontend	http://localhost:5173
Backend	http://localhost:8000

Stop:

docker compose -f docker-compose-dev.yml down

Development workflow

Using Docker (recommended)

Start: ./run.sh (or docker compose -f docker-compose-dev.yml up -d if vectordb already exists)
Logs: docker compose -f docker-compose-dev.yml logs -f
Rebuild after dependency changes:
docker compose -f docker-compose-dev.yml up -d --build

Code in backend/ and frontend/src/ is mounted into the containers, so edits are reflected without rebuilding (hot reload).

Running backend or frontend locally (no Docker)

Backend: Python 3.12, install deps from backend/requirements.txt and requirements.base.txt, set VECTORDB_PATH to ./data/vectordb, then run uvicorn from backend/ (e.g. uvicorn main:app --reload).
Frontend: From frontend/ run npm install && npm run dev. The app expects the API at http://localhost:8000 (see frontend/src/services/api.js).

Choose LLM backend

Default: Hugging Face Router via backend/app/services/llm_service_huggingface.py.

Switch to OpenAI:

Add to .env:
```
OPENAI_API_KEY=your-openai-key
```
In backend/main.py, change the import to:
```
from app.services.llm_service_openai import LLMService, SpecializedLLM
```
Comment out or remove the llm_service_huggingface import.

To switch back to Hugging Face, use: from app.services.llm_service_huggingface import LLMService, SpecializedLLM.

Configuration (.env)

Variable	Description
`HF_TOKEN`	Hugging Face API token (required for HF backend)
`OPENAI_API_KEY`	OpenAI API key (required for OpenAI backend)
`HF_MODEL` / `OPENAI_MODEL`	Model name
`HF_TEMPERATURE` / `OPENAI_TEMPERATURE`	Sampling temperature
`RETRIEVAL_K`	Number of chunks to retrieve
`RETRIEVAL_RERANK_TOP_N`	Top N after reranking
`RERANK_ENABLED`	1 or 0
`MAX_CONTEXT_CHARS`	Max characters of context sent to LLM
`MAX_QUESTION_LENGTH`	Max question length

See .env.example for the full list and defaults.

Vector DB

Prebuilt: Download from the link above, unzip into data/vectordb.
Build from scratch: Run ./run.sh with an empty (or missing) data/vectordb; the data-pipeline container will index and populate it. This can take a long time (e.g. hours) depending on data size.

Production

Start: ./deployment/scripts/start.sh (uses docker-compose-prod.yml)
Stop: ./deployment/scripts/stop.sh

Production frontend is served on port 8080; backend remains on 8000.

Sample questions

You can try questions like:

Comment obtenir un extrait d'acte de naissance ?
كيف أحصل على نسخة من رسم الولادة؟
ما هي الوثائق المطلوبة لتجديد البطاقة الوطنية؟
نسخة من السجل العدلي بالنسبة للمغاربة المقيمين بالخارج
الحصول على جواز السفر البيومتري بالنسبة للقاصرين أقل من 12 سنة بالمغرب
تجديد البطاقة الوطنية للتعريف

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatbot-mso

Features

Project structure

Prerequisites

Quick start (development)

1. Clone and env

2. Vector DB

3. Start dev stack

Development workflow

Using Docker (recommended)

Running backend or frontend locally (no Docker)

Choose LLM backend

Configuration (.env)

Vector DB

Production

Sample questions

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
OCR		OCR
backend		backend
data-pipeline		data-pipeline
data		data
deployment		deployment
frontend		frontend
monitoring		monitoring
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
docker-compose-dev.yml		docker-compose-dev.yml
docker-compose-prod.yml		docker-compose-prod.yml
run.sh		run.sh
stop.sh		stop.sh

drawliin/Chat-Bot-Administration

Folders and files

Latest commit

History

Repository files navigation

Chatbot-mso

Features

Project structure

Prerequisites

Quick start (development)

1. Clone and env

2. Vector DB

3. Start dev stack

Development workflow

Using Docker (recommended)

Running backend or frontend locally (no Docker)

Choose LLM backend

Configuration (.env)

Vector DB

Production

Sample questions

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages