feat(scoring): one-shot JSONL scoring + P1 prose fix (#75) by bjridicodes · Pull Request #81 · bayrem/AJSAA

bjridicodes · 2026-05-19T20:20:17Z

Summary

Fixes bug(scoring): Claude CLI returns conversational prose instead of JSON — batches skipped #75 (P1): Claude CLI no longer returns prose instead of JSON. SystemMessage framing forces task mode; _is_prose() fast-fails on letter-first responses (skipping the 120s parse timeout); retry sends a clean format-only prompt instead of echoing broken prose back.
Simplify scoring to JSONL in/out: analyze_jobs now reads from query/jobs_found.jsonl (the aggregate checkpoint) and writes query/jobs_scored.jsonl. One LLM call. No mode switching.
Delete hybrid/static modes: hybrid_scorer.py, static_scorer.py, profile_store.py removed. Issues perf(scoring): calibrate hybrid/static profile quality and reduce LLM↔static divergence #13 and perf(tokens): reduce token consumption in LLM batch scoring #14 closed as superseded by this direction.
Richer content in prompt: Description cap increased 600 → 1000 chars, using Tavily's real extracted content.

What changed

File	Change
`agent/nodes/analyze_jobs.py`	Read from JSONL, one code path, write scored JSONL
`providers/scoring/llm_scorer.py`	SystemMessage + prose fast-fail + clean retry + 1000-char cap
`providers/scoring/hybrid_scorer.py`	Deleted
`providers/scoring/static_scorer.py`	Deleted
`providers/scoring/profile_store.py`	Deleted
`tests/test_analyze_jobs.py`	Updated for new cap, message index, SystemMessage assertion, prose tests
`tests/test_hybrid_scorer.py`	Deleted
`tests/test_static_scorer.py`	Deleted
`tests/test_profile_store.py`	Deleted

Test plan

205 tests pass (pytest tests/ -v)
ruff: no issues
mypy: no issues (61 source files)
Manual: run infisical run --env=dev -- python scripts/test_node.py analyze_jobs with a populated query/jobs_found.jsonl to confirm JSONL read + scoring + jobs_scored.jsonl written

🤖 Generated with Claude Code

…ve call (closes #79) Replace N keyword queries with one directive LLM call that carries full context: all target positions, all locations, and company ATS hints. Strict anti-hallucination rules forbid the LLM from generating URLs from memory or training data. Capped at 30 results per run. URL validation now only drops network-unreachable domains (DNS/connection failure). ATS platforms return HTTP 200 for any path regardless of whether the job exists, so status codes were not a reliable hallucination signal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

After the LLM directive call returns URL candidates, run Tavily extract on every URL. URLs where Tavily returns no content are dropped — they are hallucinated, stale, or unreachable. URLs that pass have their description replaced with the real posting content (up to 2000 chars). LLM now asked for max_results+20 candidates so Tavily filtering doesn't leave us short of the 30-result target. Removed unreliable HEAD-based URL validation — Tavily content extraction is the definitive signal. Degrades gracefully: if TAVILY_API_KEY is not set, Tavily step is skipped and LLM output is returned as-is. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…inct modules web_search.py: returns URL candidates only ({url, source, found_in_snippet}). LLM now returns a URL-only JSON payload — no fabricated descriptions. url_validator.py (new): Tavily extract validates URLs, drops hallucinated or unreachable ones (16/26 dropped in live test), builds job dicts from real extracted content + URL-pattern metadata. search_jobs.py: calls both steps explicitly — search then validate — with separate log lines for each. Fixed config path bug (_get_positions and locations were reading from wrong key). Live result: 26 LLM candidates → 10 Tavily-validated → 8 after semantic dedup. All 8 jobs carry 2000 chars of real extracted posting content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Tavily extract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…zation header The file had two docstrings concatenated without a closing triple-quote, causing a syntax error that failed ruff/mypy. Also had duplicate search() and extract() method definitions from the merge. Moved api_key from request body to Authorization Bearer header (addresses GitHub Advanced Security flag). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…75) Three changes land together: 1. **Fix P1 (#75)**: `llm_scorer` now prepends a `SystemMessage` before the scoring payload so the Claude CLI treats it as a task not a conversation. Added prose fast-fail (`_is_prose`) that detects non-JSON output by its first character, bypassing the 120s timeout entirely. Retry sends a clean format-only prompt instead of echoing the broken prose back. 2. **JSONL-based scoring**: `analyze_jobs` reads jobs from `query/jobs_found.jsonl` (the checkpoint written by `aggregate_jobs`) rather than from LangGraph state. Scored output is written to `query/jobs_scored.jsonl`. This makes the scoring step independently runnable and the checkpoint file the single source of truth between search and scoring. 3. **Remove hybrid/static modes**: `hybrid_scorer.py`, `static_scorer.py`, and `profile_store.py` are deleted. `analyze_jobs` now has one code path: one LLM call for all jobs via `score_jobs_batch`. The mode-switching branches, the profile bootstrap loop, and the borderline escalation logic are gone. Issues #13 and #14 were closed as superseded. `cv_cache.py` is retained — CV compression is still needed and is independent of scoring mode. Description cap in the scoring prompt increased from 600 → 1000 chars to use Tavily's richer extracted content (now up to 2000 chars per job). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bjridicodes and others added 9 commits May 19, 2026 18:53

docs(readme): reflect final search architecture — directive prompt + …

9f2f674

…Tavily extract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/79-directive-search-prompt

a45d018

ci: trigger fresh checks on latest commit

5960079

docs(readme): remove rebase merge artifacts — deduplicated sections

4bcb02a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bayrem approved these changes May 19, 2026

View reviewed changes

bayrem merged commit a12d620 into main May 19, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scoring): one-shot JSONL scoring + P1 prose fix (#75)#81

feat(scoring): one-shot JSONL scoring + P1 prose fix (#75)#81
bayrem merged 9 commits into
mainfrom
feat/75-scoring-simplification

bjridicodes commented May 19, 2026 •

edited by bayrem

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bjridicodes commented May 19, 2026 • edited by bayrem Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bjridicodes commented May 19, 2026 •

edited by bayrem

Loading