A self-updating Rust service that tracks European film production incentives. It crawls 50+ film commission websites, extracts structured data using Claude, stores everything in a SurrealDB knowledge graph with vector embeddings, and exposes a streaming chat interface powered by hybrid RAG.
Ask it things like:
- "What's the best rebate for a €2.5M feature shooting in Germany?"
- "Compare Ireland Section 481 vs. UK AVEC for a $10M co-production"
- "Which European countries have over 30% rebates and no cultural test?"
50+ EU film commission websites
│
▼
Fetch HTML → follow sub-page links if landing page is thin
│
▼
Claude extracts structured JSON
(programs, rebate %, requirements, future changes, org details)
│
├─ If scrape yields nothing → Claude bootstraps from training knowledge
│ (marked as unverified, overwritten when scrape improves)
│
▼
SurrealDB knowledge graph:
├─ incentive_program (rebate %, thresholds, status, HNSW vector index)
├─ requirement (linked to program)
├─ future_change (linked to program)
├─ organization (administered_by graph edge)
├─ country (offers graph edge)
└─ data_source (crawl registry, failure tracking, auto-heal state)
│
▼
After each run:
├─ Failed URLs → crawl domain root, ask Claude to find correct page → retry
├─ Consecutive failures (10×) → auto-disable source
└─ Claude suggests new sources not yet in the registry → added automatically
│
▼
User asks a question via chat
│
▼
Claude generates a structured SearchPlan
(semantic query + country filters + program type + budget)
│
├─► Vector similarity search (HNSW cosine, BGE-large-en-v1.5 local embeddings)
└─► Structured fallback (SurrealQL filters on country, type, rebate %)
│
▼
Graph expansion: fetch requirements,
future changes, org names per result
│
▼
Claude streams the answer with
calculated EUR amounts and source citations
Ingestion runs on startup (configurable) and on a schedule (default 24 hours). Schema and seed data are applied automatically on boot — the DB is always in sync.
- Rust 1.85+ (edition 2024)
- Docker + Docker Compose (for SurrealDB)
- Anthropic API key — Claude is used for extraction, bootstrap, URL healing, source discovery, and chat
cargo-watch(optional, formake devhot reload):cargo install cargo-watch
Embeddings run locally by default using BAAI/bge-large-en-v1.5 via the
candle crate (~1.3 GB, downloaded once to ~/.cache/huggingface). No embedding
API key required. OpenAI and Voyage AI are also supported.
# 1. Clone and enter the directory
git clone <repo> mensch && cd mensch
# 2. Copy and fill in the env file
cp .env-example .env
# Required: set ANTHROPIC_API_KEY
# Everything else has sensible defaults
# 3. Start SurrealDB
make services
# 4. Seed the database (schema + initial data sources)
make db-init
# 5. Run
make run
# or with hot reload:
make devOpen http://localhost:3000. On first boot the app runs ingestion — this takes
5–10 minutes as it crawls all sources and the local embedding model loads for the
first time. Subsequent starts are fast (model cached, data already in DB).
All configuration is via environment variables (.env in development).
| Variable | Default | Description |
|---|---|---|
ANTHROPIC_API_KEY |
required | Anthropic API key |
ANTHROPIC_MODEL |
claude-sonnet-4-6 |
Claude model |
EMBEDDING_PROVIDER |
local |
local, openai, voyage, or none |
EMBEDDING_API_KEY |
— | API key for openai or voyage providers |
SURREAL_URL |
localhost:8000 |
SurrealDB host:port |
SURREAL_USER |
root |
SurrealDB username |
SURREAL_PASS |
root |
SurrealDB password |
SURREAL_NS |
mensch |
SurrealDB namespace |
SURREAL_DB |
mensch |
SurrealDB database name |
API_KEY |
— | Pre-shared key for API/WebSocket auth (leave empty to disable) |
PORT |
3000 |
HTTP port |
RUST_LOG |
info |
Log level (debug, info, warn, error) |
INGESTION_ON_STARTUP |
true |
Run a full scrape on boot |
INGESTION_INTERVAL_HOURS |
24 |
Hours between scheduled scrape runs |
Each ingestion run:
- Fetch — HTTP GET with browser-like headers; follows sub-page links if the landing page has less than 500 chars of content
- Extract — Claude reads the text and returns structured JSON (program name, rebate %, requirements, future changes, administering org)
- Bootstrap — if scraping yields 0 programs (JS-heavy site, etc.), Claude populates the record from training knowledge with a note to verify at the official URL
- Embed — BGE-large-en-v1.5 generates a 1024-dim vector for semantic search
- Heal — 404/DNS failures trigger a domain crawl + Claude picks the new URL and updates the registry; TLS errors disable the source
- Discover — Claude reviews the full source list and suggests new sources not yet tracked; they're added automatically for the next run
Re-running ingestion is always safe — all writes are upserts.
# Watch ingestion in detail
RUST_LOG=debug make runmake db-init # drop + reapply schema + reseed (wipes all extracted data)
make db-seed # reapply schema + seed without dropping (safe on running DB)
make db-schema # apply schema changes onlyThe Makefile reads SURREAL_* env vars from your shell or .env. Defaults match
the app's defaults (ns=mensch, db=mensch).
cp .env-example .env # fill in real keys
make up # docker compose up -d (app + SurrealDB)
make down # stop
docker compose logs -f # follow logsData is persisted in a named Docker volume (surreal-data). The app container
waits for SurrealDB to be healthy before starting.
All /api/* routes require X-Api-Key: <value> matching API_KEY.
The WebSocket accepts the key as ?key= query param (required for browser clients).
| Method | Path | Description |
|---|---|---|
GET |
/ |
Chat UI |
GET |
/healthcheck |
Service health, DB status, system metrics |
WS |
/ws/chat |
Streaming chat (WebSocket) |
GET |
/api/programs |
List programs (?country=de&min_rebate=20&limit=50) |
GET |
/api/programs/:slug |
Single program with requirements and future changes |
GET |
/api/countries |
All tracked countries |
GET |
/api/changes |
Upcoming/future changes across all programs |
Client → server:
{ "type": "init", "session_id": null, "project_context": { "budget_usd": 2500000, "film_type": "feature" } }
{ "type": "message", "content": "What rebates are available in Germany?" }Server → client:
{ "type": "session", "session_id": "abc123" }
{ "type": "sources", "sources": [ ... ] }
{ "type": "token", "content": "The " }
{ "type": "done" }
{ "type": "error", "message": "..." }project_context is included in every LLM call — set budget_usd, film_type,
shoot_country, or shoot_city to get personalised calculations.
make test # cargo test (in-memory SurrealDB, no running instance needed)
make lint # cargo clippy + fmt check
make watch # cargo watch with debug logging (requires running SurrealDB)| Layer | Technology |
|---|---|
| Language | Rust (edition 2024) |
| Web framework | Axum 0.8 |
| Database | SurrealDB 3 (document + graph + vector) |
| LLM | Anthropic Claude (extraction, bootstrap, healing, discovery, chat) |
| Embeddings | BGE-large-en-v1.5 via candle (local, default) · OpenAI · Voyage AI |
| Scraping | reqwest + scraper (HTML → text, sub-page following) |
| Templating | Askama (server-side HTML) |
| Async runtime | Tokio |
| Auth | JWT-ready · pre-shared API key |