Skip to content

feat(scoring): local LLM hybrid mode — Claude calibrates, Qwen2.5 scores subsequent runs #54

@bjridicodes

Description

@bjridicodes

Summary

Extend the hybrid scoring mode to use a local LLM (llama.cpp + Qwen2.5-Coder-1.5B) for subsequent runs after Claude has calibrated the scoring profile on the first pass. Claude acts as teacher on run 1, generating scored examples and reasoning. The local model acts as student on all subsequent runs, pattern-matching against those anchors.

Motivation

The current hybrid mode transitions from LLM scoring → static rule-based scoring after the first run. Static rules miss nuance — they can't adapt to new job types or edge cases. A local LLM with few-shot examples from Claude is richer signal: near-zero cost, no API dependency, and more flexible than hand-crafted rules.

Hardware constraint: VPS limits local model to 1.5B parameters. The few-shot anchoring approach makes this viable — Qwen2.5-Coder-1.5B doesn't need to reason from scratch, it pattern-matches against Claude's established scores.

Design

Run 1 — Claude (teacher)

  • Scores the first batch of jobs for a given CV as today
  • Additionally generates a set of scored examples with reasoning and saves them to scoring_profiles/[cv_name].json:
    {
      "criteria": ["senior PM title", "Paris or remote", "data/AI domain"],
      "red_flags": ["engineering-only role", "outside France"],
      "examples": [
        {"title": "...", "company": "...", "score": 88, "reasoning": "Strong data platform background, Paris, PM title confirmed"},
        {"title": "...", "company": "...", "score": 52, "reasoning": "Engineering role, no product ownership scope"}
      ]
    }

Run 2+ — Qwen2.5-Coder (student)

  • Loads profile + few-shot examples from scoring_profiles/[cv_name].json
  • Prompt: "Here is a CV profile and N scored examples. Score the following job using the same logic. Return JSON with score and one-line reasoning."
  • Qwen pattern-matches against Claude's anchors
  • Jobs within uncertainty_band escalate back to Claude for re-scoring

New scoring mode

Add local_llm as a valid value for scoring.mode, or extend hybrid to support a local_provider config key:

scoring:
  mode: hybrid
  local_provider: llamacpp         # used for subsequent runs instead of static rules
  uncertainty_band: [65, 85]       # escalate to cloud LLM if local score falls here

Integration

  • llama.cpp in server mode exposes an OpenAI-compatible API at localhost:8080/v1
  • New LlamaCppProvider follows the same pattern as openai_provider.py with custom base_url
  • No API key required — local inference only

What this replaces

Replaces the static rule-based second pass in hybrid mode. Static scoring stays available as a fallback if the local server isn't running.

Related

  • Extends the existing hybrid scorer (providers/scoring/llm_scorer.py, scoring_profiles/)
  • Related to feat/kimi-k2-provider branch — LlamaCppProvider shares the OpenAI-compatible base_url pattern already prototyped there
  • Related to the local LLM idea in wiki/projects/ajsaa/ideas/unimplemented/local_llm_for_mundane_tasks.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions