Mensch — European Film Incentives Intelligence

A self-updating Rust service that tracks European film production incentives. It crawls 50+ film commission websites, extracts structured data using Claude, stores everything in a SurrealDB knowledge graph with vector embeddings, and exposes a streaming chat interface powered by hybrid RAG.

Ask it things like:

"What's the best rebate for a €2.5M feature shooting in Germany?"
"Compare Ireland Section 481 vs. UK AVEC for a $10M co-production"
"Which European countries have over 30% rebates and no cultural test?"

How it works

50+ EU film commission websites
          │
          ▼
  Fetch HTML → follow sub-page links if landing page is thin
          │
          ▼
  Claude extracts structured JSON
  (programs, rebate %, requirements, future changes, org details)
          │
          ├─ If scrape yields nothing → Claude bootstraps from training knowledge
          │                             (marked as unverified, overwritten when scrape improves)
          │
          ▼
  SurrealDB knowledge graph:
  ├─ incentive_program  (rebate %, thresholds, status, HNSW vector index)
  ├─ requirement        (linked to program)
  ├─ future_change      (linked to program)
  ├─ organization       (administered_by graph edge)
  ├─ country            (offers graph edge)
  └─ data_source        (crawl registry, failure tracking, auto-heal state)
          │
          ▼
  After each run:
  ├─ Failed URLs → crawl domain root, ask Claude to find correct page → retry
  ├─ Consecutive failures (10×) → auto-disable source
  └─ Claude suggests new sources not yet in the registry → added automatically
          │
          ▼
  User asks a question via chat
          │
          ▼
  Claude generates a structured SearchPlan
  (semantic query + country filters + program type + budget)
          │
          ├─► Vector similarity search (HNSW cosine, BGE-large-en-v1.5 local embeddings)
          └─► Structured fallback (SurrealQL filters on country, type, rebate %)
                    │
                    ▼
             Graph expansion: fetch requirements,
             future changes, org names per result
                    │
                    ▼
             Claude streams the answer with
             calculated EUR amounts and source citations

Ingestion runs on startup (configurable) and on a schedule (default 24 hours). Schema and seed data are applied automatically on boot — the DB is always in sync.

Requirements

Rust 1.85+ (edition 2024)
Docker + Docker Compose (for SurrealDB)
Anthropic API key — Claude is used for extraction, bootstrap, URL healing, source discovery, and chat
cargo-watch (optional, for make dev hot reload): cargo install cargo-watch

Embeddings run locally by default using BAAI/bge-large-en-v1.5 via the candle crate (~1.3 GB, downloaded once to ~/.cache/huggingface). No embedding API key required. OpenAI and Voyage AI are also supported.

Quick start

# 1. Clone and enter the directory
git clone <repo> mensch && cd mensch

# 2. Copy and fill in the env file
cp .env-example .env
# Required: set ANTHROPIC_API_KEY
# Everything else has sensible defaults

# 3. Start SurrealDB
make services

# 4. Seed the database (schema + initial data sources)
make db-init

# 5. Run
make run
# or with hot reload:
make dev

Open http://localhost:3000. On first boot the app runs ingestion — this takes 5–10 minutes as it crawls all sources and the local embedding model loads for the first time. Subsequent starts are fast (model cached, data already in DB).

Configuration

All configuration is via environment variables (.env in development).

Variable	Default	Description
`ANTHROPIC_API_KEY`	required	Anthropic API key
`ANTHROPIC_MODEL`	`claude-sonnet-4-6`	Claude model
`EMBEDDING_PROVIDER`	`local`	`local`, `openai`, `voyage`, or `none`
`EMBEDDING_API_KEY`	—	API key for `openai` or `voyage` providers
`SURREAL_URL`	`localhost:8000`	SurrealDB host:port
`SURREAL_USER`	`root`	SurrealDB username
`SURREAL_PASS`	`root`	SurrealDB password
`SURREAL_NS`	`mensch`	SurrealDB namespace
`SURREAL_DB`	`mensch`	SurrealDB database name
`API_KEY`	—	Pre-shared key for API/WebSocket auth (leave empty to disable)
`PORT`	`3000`	HTTP port
`RUST_LOG`	`info`	Log level (`debug`, `info`, `warn`, `error`)
`INGESTION_ON_STARTUP`	`true`	Run a full scrape on boot
`INGESTION_INTERVAL_HOURS`	`24`	Hours between scheduled scrape runs

Ingestion pipeline

Each ingestion run:

Fetch — HTTP GET with browser-like headers; follows sub-page links if the landing page has less than 500 chars of content
Extract — Claude reads the text and returns structured JSON (program name, rebate %, requirements, future changes, administering org)
Bootstrap — if scraping yields 0 programs (JS-heavy site, etc.), Claude populates the record from training knowledge with a note to verify at the official URL
Embed — BGE-large-en-v1.5 generates a 1024-dim vector for semantic search
Heal — 404/DNS failures trigger a domain crawl + Claude picks the new URL and updates the registry; TLS errors disable the source
Discover — Claude reviews the full source list and suggests new sources not yet tracked; they're added automatically for the next run

Re-running ingestion is always safe — all writes are upserts.

# Watch ingestion in detail
RUST_LOG=debug make run

Database management

make db-init     # drop + reapply schema + reseed (wipes all extracted data)
make db-seed     # reapply schema + seed without dropping (safe on running DB)
make db-schema   # apply schema changes only

The Makefile reads SURREAL_* env vars from your shell or .env. Defaults match the app's defaults (ns=mensch, db=mensch).

Production deployment

cp .env-example .env   # fill in real keys
make up                # docker compose up -d (app + SurrealDB)
make down              # stop
docker compose logs -f # follow logs

Data is persisted in a named Docker volume (surreal-data). The app container waits for SurrealDB to be healthy before starting.

API

All /api/* routes require X-Api-Key: <value> matching API_KEY. The WebSocket accepts the key as ?key= query param (required for browser clients).

Method	Path	Description
`GET`	`/`	Chat UI
`GET`	`/healthcheck`	Service health, DB status, system metrics
`WS`	`/ws/chat`	Streaming chat (WebSocket)
`GET`	`/api/programs`	List programs (`?country=de&min_rebate=20&limit=50`)
`GET`	`/api/programs/:slug`	Single program with requirements and future changes
`GET`	`/api/countries`	All tracked countries
`GET`	`/api/changes`	Upcoming/future changes across all programs

WebSocket protocol

Client → server:

{ "type": "init", "session_id": null, "project_context": { "budget_usd": 2500000, "film_type": "feature" } }
{ "type": "message", "content": "What rebates are available in Germany?" }

Server → client:

{ "type": "session", "session_id": "abc123" }
{ "type": "sources", "sources": [ ... ] }
{ "type": "token", "content": "The " }
{ "type": "done" }
{ "type": "error", "message": "..." }

project_context is included in every LLM call — set budget_usd, film_type, shoot_country, or shoot_city to get personalised calculations.

Development

make test    # cargo test (in-memory SurrealDB, no running instance needed)
make lint    # cargo clippy + fmt check
make watch   # cargo watch with debug logging (requires running SurrealDB)

Stack

Layer	Technology
Language	Rust (edition 2024)
Web framework	Axum 0.8
Database	SurrealDB 3 (document + graph + vector)
LLM	Anthropic Claude (extraction, bootstrap, healing, discovery, chat)
Embeddings	BGE-large-en-v1.5 via candle (local, default) · OpenAI · Voyage AI
Scraping	`reqwest` + `scraper` (HTML → text, sub-page following)
Templating	Askama (server-side HTML)
Async runtime	Tokio
Auth	JWT-ready · pre-shared API key

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
db		db
src		src
static		static
templates		templates
.env-example		.env-example
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mensch — European Film Incentives Intelligence

How it works

Requirements

Quick start

Configuration

Ingestion pipeline

Database management

Production deployment

API

WebSocket protocol

Development

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mensch — European Film Incentives Intelligence

How it works

Requirements

Quick start

Configuration

Ingestion pipeline

Database management

Production deployment

API

WebSocket protocol

Development

Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages