feat(scoring): one-shot JSONL scoring + P1 prose fix (#75)#81
Merged
Conversation
…ve call (closes #79) Replace N keyword queries with one directive LLM call that carries full context: all target positions, all locations, and company ATS hints. Strict anti-hallucination rules forbid the LLM from generating URLs from memory or training data. Capped at 30 results per run. URL validation now only drops network-unreachable domains (DNS/connection failure). ATS platforms return HTTP 200 for any path regardless of whether the job exists, so status codes were not a reliable hallucination signal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the LLM directive call returns URL candidates, run Tavily extract on every URL. URLs where Tavily returns no content are dropped — they are hallucinated, stale, or unreachable. URLs that pass have their description replaced with the real posting content (up to 2000 chars). LLM now asked for max_results+20 candidates so Tavily filtering doesn't leave us short of the 30-result target. Removed unreliable HEAD-based URL validation — Tavily content extraction is the definitive signal. Degrades gracefully: if TAVILY_API_KEY is not set, Tavily step is skipped and LLM output is returned as-is. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inct modules
web_search.py: returns URL candidates only ({url, source, found_in_snippet}).
LLM now returns a URL-only JSON payload — no fabricated descriptions.
url_validator.py (new): Tavily extract validates URLs, drops hallucinated
or unreachable ones (16/26 dropped in live test), builds job dicts from
real extracted content + URL-pattern metadata.
search_jobs.py: calls both steps explicitly — search then validate — with
separate log lines for each. Fixed config path bug (_get_positions and
locations were reading from wrong key).
Live result: 26 LLM candidates → 10 Tavily-validated → 8 after semantic
dedup. All 8 jobs carry 2000 chars of real extracted posting content.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Tavily extract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…zation header The file had two docstrings concatenated without a closing triple-quote, causing a syntax error that failed ruff/mypy. Also had duplicate search() and extract() method definitions from the merge. Moved api_key from request body to Authorization Bearer header (addresses GitHub Advanced Security flag). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…75) Three changes land together: 1. **Fix P1 (#75)**: `llm_scorer` now prepends a `SystemMessage` before the scoring payload so the Claude CLI treats it as a task not a conversation. Added prose fast-fail (`_is_prose`) that detects non-JSON output by its first character, bypassing the 120s timeout entirely. Retry sends a clean format-only prompt instead of echoing the broken prose back. 2. **JSONL-based scoring**: `analyze_jobs` reads jobs from `query/jobs_found.jsonl` (the checkpoint written by `aggregate_jobs`) rather than from LangGraph state. Scored output is written to `query/jobs_scored.jsonl`. This makes the scoring step independently runnable and the checkpoint file the single source of truth between search and scoring. 3. **Remove hybrid/static modes**: `hybrid_scorer.py`, `static_scorer.py`, and `profile_store.py` are deleted. `analyze_jobs` now has one code path: one LLM call for all jobs via `score_jobs_batch`. The mode-switching branches, the profile bootstrap loop, and the borderline escalation logic are gone. Issues #13 and #14 were closed as superseded. `cv_cache.py` is retained — CV compression is still needed and is independent of scoring mode. Description cap in the scoring prompt increased from 600 → 1000 chars to use Tavily's richer extracted content (now up to 2000 chars per job). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bayrem
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SystemMessageframing forces task mode;_is_prose()fast-fails on letter-first responses (skipping the 120s parse timeout); retry sends a clean format-only prompt instead of echoing broken prose back.analyze_jobsnow reads fromquery/jobs_found.jsonl(the aggregate checkpoint) and writesquery/jobs_scored.jsonl. One LLM call. No mode switching.hybrid_scorer.py,static_scorer.py,profile_store.pyremoved. Issues perf(scoring): calibrate hybrid/static profile quality and reduce LLM↔static divergence #13 and perf(tokens): reduce token consumption in LLM batch scoring #14 closed as superseded by this direction.What changed
agent/nodes/analyze_jobs.pyproviders/scoring/llm_scorer.pyproviders/scoring/hybrid_scorer.pyproviders/scoring/static_scorer.pyproviders/scoring/profile_store.pytests/test_analyze_jobs.pytests/test_hybrid_scorer.pytests/test_static_scorer.pytests/test_profile_store.pyTest plan
pytest tests/ -v)infisical run --env=dev -- python scripts/test_node.py analyze_jobswith a populatedquery/jobs_found.jsonlto confirm JSONL read + scoring +jobs_scored.jsonlwritten🤖 Generated with Claude Code