A LangGraph-based agent that autonomously discovers, scores, and tracks job opportunities against your CV profiles — and notifies you of the best matches.
- Loads context — reads your CV files (
query/resume/), generates search queries deterministically fromconfig/search_config.yaml(positions × locations cross-product), and loads target companies with their ATS hints - Searches for jobs — one directive LLM prompt returns job URLs only (no fabricated descriptions); Tavily extract validates each URL and pulls real posting content (hallucinated or unreachable URLs are dropped); company ATS boards (Greenhouse, Lever, Ashby) are queried via direct API — zero LLM tokens for ATS; all results deduplicated and checkpointed to
query/jobs_found.jsonl - Scores matches — single LLM call scores all jobs against your CV; keeps only jobs above a configurable threshold
- Stores results — deduplicates by content-hash and writes to local JSON and/or cloud storage (Google Drive, OneDrive, Dropbox)
- Notifies you — sends a digest to Telegram, Slack, email, or WhatsApp
flowchart TD
A([run.py]) --> B[load_context]
B --> C{PDFs in resume/?}
C -- yes --> D[convert_cvs]
C -- no --> E{job_queries.md?}
D --> E
E -- no --> F[generate_queries\npositions × locations from search_config]
E -- yes --> G[search_jobs\nLLM directive → Tavily extract]
F --> G
G --> H[search_companies\nATS direct API]
H --> I[aggregate_jobs\ndedup · cap · jobs_found.jsonl]
I --> J2[analyze_jobs\nsingle LLM scoring call]
J2 --> J[store_results\nlocal JSON + cloud sync]
J --> K{notifications\nenabled?}
K -- yes --> L[send_notifications\nTelegram · Slack · email]
K -- no --> M([END])
L --> M
Every provider is swappable via the config/ files — LLM, search connectors, storage backend, and notification channels all follow the same factory pattern.
Numbers from real pipeline runs against a senior product manager / data platform profile, Paris market:
| Metric | Value |
|---|---|
| Jobs discovered per run | ~19 unique postings |
| Jobs passing score threshold (≥ 70) | 15 |
| Top match score | 92 / 95 |
| Recommended to apply | 6 |
| Worth considering | 9 |
| Search queries run | 13 |
| Duplicate entries across runs | 0 (content-hash deduplication) |
Scoring uses a 0–95 scale (95 is capped to avoid inflated "perfect" scores). The LLM justifies each score in one sentence stored alongside the job record.
# 1. Install
python3 -m venv .venv
.venv/bin/pip install -r requirements.txt
# 2. Configure secrets (project uses Infisical — no .env files)
# Install the Infisical CLI: https://infisical.com/docs/cli/overview
# Then add secrets to your Infisical project (env: dev):
# TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID — for notifications
# TAVILY_API_KEY — for URL validation and extraction (required)
# FRANCE_TRAVAIL_CLIENT_ID/SECRET — optional free job board API
# ADZUNA_APP_ID/KEY — optional free job board API
# 3. Add your CV
# Drop a PDF or .md file into query/resume/
# 4. Run
infisical run --env=dev -- python run.py
# Dry-run (scores jobs without writing to storage)
infisical run --env=dev -- python run.py --dry-runConfiguration is split across three files in the config/ folder:
| File | What goes here |
|---|---|
config/config.yaml |
Infrastructure: LLM provider, connectors, storage, notifications, logging |
config/search_config.yaml |
User preferences: target positions, locations, companies to monitor |
config/score_config.yaml |
Scoring: thresholds, uncertainty band, profiles directory |
run.py merges all three at startup. You only need to edit config/search_config.yaml for day-to-day use.
config/config.yaml — swap providers without touching code:
llm:
provider: claude_code_agent # anthropic | openai | claude_code_agent
search:
connectors:
- name: anthropic_web # primary: LLM directive search → Tavily extract
max_results_per_query: 4
- name: france_travail # optional free API — francetravail.io
enabled: false
- name: adzuna # optional free API — developer.adzuna.com
enabled: false
storage:
provider: local # local | google_drive | onedrive | dropbox
notifications:
channels: [telegram] # email | slack | telegram | whatsapp
logging:
rotation: per_run # none | daily | per_run
retention: 7config/search_config.yaml — your search preferences:
cvs:
cv1:
- "Senior Product Manager"
- "Head of Product"
cv2:
- "AI Product Manager"
- "Product Lead"
locations:
- "Paris"
- "Remote"
companies:
- "Mistral AI" # LLM discovers ATS on first run, result cached
- name: "Hugging Face"
hint: "greenhouse:huggingface" # skips LLM — uses ATS hint directly
- name: "Criteo"
url: "https://jobs.lever.co/criteo" # skips LLM — fetches URL directlygenerate_queries builds a deterministic cross-product of positions × locations and writes query/job_queries.md with a hash header — no LLM call, result cached until search_config.yaml changes.
config/score_config.yaml — scoring thresholds:
scoring:
min_score: 70 # jobs below this are discarded (0–95 scale)Each run produces:
- Live TUI — Rich terminal dashboard updates in-place as the pipeline runs, showing node status, KPIs, and elapsed time per step
- Live web monitor — an in-process HTTP server serves a browser-based dashboard at
http://127.0.0.1:8765/for the duration of the run (see below) - HTML report — after every run,
logs/index.html(run list with Chart.js time-series chart + 10-column table) andlogs/runs/run_*.html(per-run detail with pipeline table, token/cost per node, and job cards) are written automatically - Log rotation — configurable via
logging.rotation(none/daily/per_run) with aretentioncount
When the pipeline is launched via Claude Code in VS Code (which blocks the TUI) the live web monitor is the way to watch progress. run.py spawns a small http.server.ThreadingHTTPServer on 127.0.0.1:8765 and prints the URL on boot:
🌐 Live monitor: http://127.0.0.1:8765/ (run_id=abc12345)
The page polls /state.json every second, refreshes the pipeline table, token-spend block, and job cards in place, and stops polling automatically when the run finishes. The same HTML template is reused for the static post-run report at logs/runs/run_*.html (just without the JS poll block).
CLI flags (issue #62):
--port N— override the default 8765 (must be 1024–65535).--no-monitor— skip the HTTP server entirely; the TUI and post-run report still work.
The server binds to 127.0.0.1 only by design — no authentication, no network exposure. It dies with run.py (daemon thread). If the port is busy the run continues without the monitor and logs a clear warning.
Every LLM call is recorded with its token counts and dollar cost (issue #60). The data is surfaced in three places (issue #61):
- Live TUI footer — a compact line below the dashboard table refreshes at 4 Hz:
Tokens: 14.2k in / 1.9k out · $0.42 · 8 calls - Pipeline-end log line — one-line summary printed to stdout/log:
Tokens: $0.42 total · 12345 in / 1876 out · 8 calls (sonnet $0.31, haiku $0.11) - HTML report —
logs/runs/run_*.htmlincludes a Token spend block with grand total, per-model table, and a collapsed per-node breakdown; the pipeline table shows Tokens and Cost columns per node;logs/index.htmladds a Chart.js time-series chart (6 selectable Y-axis metrics) and per-run columns for Status, Tokens consumed, and Cost $
Per-model and per-node totals are stored on the final state as token_usage (shape: {"by_model": {...}, "by_node": {...}, "grand_total": {...}}). Prices live in providers/llm/pricing.py and need a manual refresh when a vendor changes its rate card — the # Prices verified YYYY-MM-DD comment is the canary. Unknown models log a single warning and report $0.00 rather than crashing.
| Concern | Default |
|---|---|
| Orchestration | LangGraph |
| LLM interface | LangChain (Anthropic Claude / OpenAI) |
| Search | Claude web search (directive prompt) + Tavily extract (validation + content) |
| Job boards | France Travail, Adzuna (optional) |
| ATS boards | Greenhouse, Lever, Ashby (unauthenticated HTTP) |
| Terminal UI | Rich |
| Storage | Local JSON (Google Drive / OneDrive / Dropbox) |
| Notifications | Telegram (email / Slack / WhatsApp) |
| Secrets | Infisical |