feat(search): directive prompt for anthropic_web — single comprehensive LLM call (closes #79)#80
Merged
Merged
Conversation
…ve call (closes #79) Replace N keyword queries with one directive LLM call that carries full context: all target positions, all locations, and company ATS hints. Strict anti-hallucination rules forbid the LLM from generating URLs from memory or training data. Capped at 30 results per run. URL validation now only drops network-unreachable domains (DNS/connection failure). ATS platforms return HTTP 200 for any path regardless of whether the job exists, so status codes were not a reliable hallucination signal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the LLM directive call returns URL candidates, run Tavily extract on every URL. URLs where Tavily returns no content are dropped — they are hallucinated, stale, or unreachable. URLs that pass have their description replaced with the real posting content (up to 2000 chars). LLM now asked for max_results+20 candidates so Tavily filtering doesn't leave us short of the 30-result target. Removed unreliable HEAD-based URL validation — Tavily content extraction is the definitive signal. Degrades gracefully: if TAVILY_API_KEY is not set, Tavily step is skipped and LLM output is returned as-is. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inct modules
web_search.py: returns URL candidates only ({url, source, found_in_snippet}).
LLM now returns a URL-only JSON payload — no fabricated descriptions.
url_validator.py (new): Tavily extract validates URLs, drops hallucinated
or unreachable ones (16/26 dropped in live test), builds job dicts from
real extracted content + URL-pattern metadata.
search_jobs.py: calls both steps explicitly — search then validate — with
separate log lines for each. Fixed config path bug (_get_positions and
locations were reading from wrong key).
Live result: 26 LLM candidates → 10 Tavily-validated → 8 after semantic
dedup. All 8 jobs carry 2000 chars of real extracted posting content.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Tavily extract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bayrem
previously approved these changes
May 19, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…zation header The file had two docstrings concatenated without a closing triple-quote, causing a syntax error that failed ruff/mypy. Also had duplicate search() and extract() method definitions from the merge. Moved api_key from request body to Authorization Bearer header (addresses GitHub Advanced Security flag). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bayrem
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
anthropic_webwith a single directive prompt that gives the LLM full context: all target positions, all locations, and company ATS hints in one callurl:entries) are formatted into the prompt so the LLM knows exactly where to look for each companyhint: noneare omitted from the focused list (previous discovery failure) but the LLM still searches for them generallyTest plan
🤖 Generated with Claude Code