Summary
Extend the hybrid scoring mode to use a local LLM (llama.cpp + Qwen2.5-Coder-1.5B) for subsequent runs after Claude has calibrated the scoring profile on the first pass. Claude acts as teacher on run 1, generating scored examples and reasoning. The local model acts as student on all subsequent runs, pattern-matching against those anchors.
Motivation
The current hybrid mode transitions from LLM scoring → static rule-based scoring after the first run. Static rules miss nuance — they can't adapt to new job types or edge cases. A local LLM with few-shot examples from Claude is richer signal: near-zero cost, no API dependency, and more flexible than hand-crafted rules.
Hardware constraint: VPS limits local model to 1.5B parameters. The few-shot anchoring approach makes this viable — Qwen2.5-Coder-1.5B doesn't need to reason from scratch, it pattern-matches against Claude's established scores.
Design
Run 1 — Claude (teacher)
- Scores the first batch of jobs for a given CV as today
- Additionally generates a set of scored examples with reasoning and saves them to
scoring_profiles/[cv_name].json:
{
"criteria": ["senior PM title", "Paris or remote", "data/AI domain"],
"red_flags": ["engineering-only role", "outside France"],
"examples": [
{"title": "...", "company": "...", "score": 88, "reasoning": "Strong data platform background, Paris, PM title confirmed"},
{"title": "...", "company": "...", "score": 52, "reasoning": "Engineering role, no product ownership scope"}
]
}
Run 2+ — Qwen2.5-Coder (student)
- Loads profile + few-shot examples from
scoring_profiles/[cv_name].json
- Prompt: "Here is a CV profile and N scored examples. Score the following job using the same logic. Return JSON with score and one-line reasoning."
- Qwen pattern-matches against Claude's anchors
- Jobs within
uncertainty_band escalate back to Claude for re-scoring
New scoring mode
Add local_llm as a valid value for scoring.mode, or extend hybrid to support a local_provider config key:
scoring:
mode: hybrid
local_provider: llamacpp # used for subsequent runs instead of static rules
uncertainty_band: [65, 85] # escalate to cloud LLM if local score falls here
Integration
llama.cpp in server mode exposes an OpenAI-compatible API at localhost:8080/v1
- New
LlamaCppProvider follows the same pattern as openai_provider.py with custom base_url
- No API key required — local inference only
What this replaces
Replaces the static rule-based second pass in hybrid mode. Static scoring stays available as a fallback if the local server isn't running.
Related
- Extends the existing hybrid scorer (
providers/scoring/llm_scorer.py, scoring_profiles/)
- Related to
feat/kimi-k2-provider branch — LlamaCppProvider shares the OpenAI-compatible base_url pattern already prototyped there
- Related to the local LLM idea in
wiki/projects/ajsaa/ideas/unimplemented/local_llm_for_mundane_tasks.md
Summary
Extend the hybrid scoring mode to use a local LLM (llama.cpp + Qwen2.5-Coder-1.5B) for subsequent runs after Claude has calibrated the scoring profile on the first pass. Claude acts as teacher on run 1, generating scored examples and reasoning. The local model acts as student on all subsequent runs, pattern-matching against those anchors.
Motivation
The current hybrid mode transitions from LLM scoring → static rule-based scoring after the first run. Static rules miss nuance — they can't adapt to new job types or edge cases. A local LLM with few-shot examples from Claude is richer signal: near-zero cost, no API dependency, and more flexible than hand-crafted rules.
Hardware constraint: VPS limits local model to 1.5B parameters. The few-shot anchoring approach makes this viable — Qwen2.5-Coder-1.5B doesn't need to reason from scratch, it pattern-matches against Claude's established scores.
Design
Run 1 — Claude (teacher)
scoring_profiles/[cv_name].json:{ "criteria": ["senior PM title", "Paris or remote", "data/AI domain"], "red_flags": ["engineering-only role", "outside France"], "examples": [ {"title": "...", "company": "...", "score": 88, "reasoning": "Strong data platform background, Paris, PM title confirmed"}, {"title": "...", "company": "...", "score": 52, "reasoning": "Engineering role, no product ownership scope"} ] }Run 2+ — Qwen2.5-Coder (student)
scoring_profiles/[cv_name].jsonuncertainty_bandescalate back to Claude for re-scoringNew scoring mode
Add
local_llmas a valid value forscoring.mode, or extendhybridto support alocal_providerconfig key:Integration
llama.cppin server mode exposes an OpenAI-compatible API atlocalhost:8080/v1LlamaCppProviderfollows the same pattern asopenai_provider.pywith custombase_urlWhat this replaces
Replaces the static rule-based second pass in hybrid mode. Static scoring stays available as a fallback if the local server isn't running.
Related
providers/scoring/llm_scorer.py,scoring_profiles/)feat/kimi-k2-providerbranch —LlamaCppProvidershares the OpenAI-compatible base_url pattern already prototyped therewiki/projects/ajsaa/ideas/unimplemented/local_llm_for_mundane_tasks.md