feat(search): directive prompt for anthropic_web — single comprehensive LLM call (closes #79) by bjridicodes · Pull Request #80 · bayrem/AJSAA

bjridicodes · 2026-05-19T18:53:48Z

Summary

Replaces the N-query keyword loop for anthropic_web with a single directive prompt that gives the LLM full context: all target positions, all locations, and company ATS hints in one call
Strict anti-hallucination rules: LLM is forbidden from generating URLs from memory or training data; must only use URLs found via web search
Company hints (Greenhouse/Lever/Ashby slugs, url: entries) are formatted into the prompt so the LLM knows exactly where to look for each company
Companies with hint: none are omitted from the focused list (previous discovery failure) but the LLM still searches for them generally
URL validation fixed: previously dropped valid jobs because ATS servers return 403/404 to unauthenticated agents even for real postings — now only drops network-unreachable URLs (DNS/connection failure)
Capped at 30 results per run

Test plan

229 tests passing
ruff clean, mypy clean
Directive prompt previewed with real config data — correct formatting

🤖 Generated with Claude Code

…ve call (closes #79) Replace N keyword queries with one directive LLM call that carries full context: all target positions, all locations, and company ATS hints. Strict anti-hallucination rules forbid the LLM from generating URLs from memory or training data. Capped at 30 results per run. URL validation now only drops network-unreachable domains (DNS/connection failure). ATS platforms return HTTP 200 for any path regardless of whether the job exists, so status codes were not a reliable hallucination signal. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

After the LLM directive call returns URL candidates, run Tavily extract on every URL. URLs where Tavily returns no content are dropped — they are hallucinated, stale, or unreachable. URLs that pass have their description replaced with the real posting content (up to 2000 chars). LLM now asked for max_results+20 candidates so Tavily filtering doesn't leave us short of the 30-result target. Removed unreliable HEAD-based URL validation — Tavily content extraction is the definitive signal. Degrades gracefully: if TAVILY_API_KEY is not set, Tavily step is skipped and LLM output is returned as-is. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…inct modules web_search.py: returns URL candidates only ({url, source, found_in_snippet}). LLM now returns a URL-only JSON payload — no fabricated descriptions. url_validator.py (new): Tavily extract validates URLs, drops hallucinated or unreachable ones (16/26 dropped in live test), builds job dicts from real extracted content + URL-pattern metadata. search_jobs.py: calls both steps explicitly — search then validate — with separate log lines for each. Fixed config path bug (_get_positions and locations were reading from wrong key). Live result: 26 LLM candidates → 10 Tavily-validated → 8 after semantic dedup. All 8 jobs carry 2000 chars of real extracted posting content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Tavily extract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…zation header The file had two docstrings concatenated without a closing triple-quote, causing a syntax error that failed ruff/mypy. Also had duplicate search() and extract() method definitions from the merge. Moved api_key from request body to Authorization Bearer header (addresses GitHub Advanced Security flag). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bjridicodes and others added 5 commits May 19, 2026 18:53

docs(readme): reflect final search architecture — directive prompt + …

9f2f674

…Tavily extract Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/79-directive-search-prompt

a45d018

bayrem previously approved these changes May 19, 2026

View reviewed changes

github-advanced-security AI found potential problems May 19, 2026

View reviewed changes

Comment thread providers/search/connectors/tavily.py Fixed

bjridicodes and others added 2 commits May 19, 2026 19:40

ci: trigger fresh checks on latest commit

5960079

docs(readme): remove rebase merge artifacts — deduplicated sections

4bcb02a

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bjridicodes dismissed bayrem’s stale review via 4bcb02a May 19, 2026 19:44

bayrem approved these changes May 19, 2026

View reviewed changes

bayrem merged commit 96c8314 into main May 19, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(search): directive prompt for anthropic_web — single comprehensive LLM call (closes #79)#80

feat(search): directive prompt for anthropic_web — single comprehensive LLM call (closes #79)#80
bayrem merged 8 commits into
mainfrom
feat/79-directive-search-prompt

bjridicodes commented May 19, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bjridicodes commented May 19, 2026

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants