Skip to content

feat(search): directive prompt for anthropic_web — single comprehensive LLM call (closes #79)#80

Merged
bayrem merged 8 commits into
mainfrom
feat/79-directive-search-prompt
May 19, 2026
Merged

feat(search): directive prompt for anthropic_web — single comprehensive LLM call (closes #79)#80
bayrem merged 8 commits into
mainfrom
feat/79-directive-search-prompt

Conversation

@bjridicodes

Copy link
Copy Markdown
Collaborator

Summary

  • Replaces the N-query keyword loop for anthropic_web with a single directive prompt that gives the LLM full context: all target positions, all locations, and company ATS hints in one call
  • Strict anti-hallucination rules: LLM is forbidden from generating URLs from memory or training data; must only use URLs found via web search
  • Company hints (Greenhouse/Lever/Ashby slugs, url: entries) are formatted into the prompt so the LLM knows exactly where to look for each company
  • Companies with hint: none are omitted from the focused list (previous discovery failure) but the LLM still searches for them generally
  • URL validation fixed: previously dropped valid jobs because ATS servers return 403/404 to unauthenticated agents even for real postings — now only drops network-unreachable URLs (DNS/connection failure)
  • Capped at 30 results per run

Test plan

  • 229 tests passing
  • ruff clean, mypy clean
  • Directive prompt previewed with real config data — correct formatting

🤖 Generated with Claude Code

bjridicodes and others added 5 commits May 19, 2026 18:53
…ve call (closes #79)

Replace N keyword queries with one directive LLM call that carries full
context: all target positions, all locations, and company ATS hints. Strict
anti-hallucination rules forbid the LLM from generating URLs from memory or
training data. Capped at 30 results per run.

URL validation now only drops network-unreachable domains (DNS/connection
failure). ATS platforms return HTTP 200 for any path regardless of whether
the job exists, so status codes were not a reliable hallucination signal.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After the LLM directive call returns URL candidates, run Tavily extract
on every URL. URLs where Tavily returns no content are dropped — they are
hallucinated, stale, or unreachable. URLs that pass have their description
replaced with the real posting content (up to 2000 chars).

LLM now asked for max_results+20 candidates so Tavily filtering doesn't
leave us short of the 30-result target. Removed unreliable HEAD-based URL
validation — Tavily content extraction is the definitive signal.

Degrades gracefully: if TAVILY_API_KEY is not set, Tavily step is skipped
and LLM output is returned as-is.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…inct modules

web_search.py: returns URL candidates only ({url, source, found_in_snippet}).
  LLM now returns a URL-only JSON payload — no fabricated descriptions.

url_validator.py (new): Tavily extract validates URLs, drops hallucinated
  or unreachable ones (16/26 dropped in live test), builds job dicts from
  real extracted content + URL-pattern metadata.

search_jobs.py: calls both steps explicitly — search then validate — with
  separate log lines for each. Fixed config path bug (_get_positions and
  locations were reading from wrong key).

Live result: 26 LLM candidates → 10 Tavily-validated → 8 after semantic
dedup. All 8 jobs carry 2000 chars of real extracted posting content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Tavily extract

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bayrem
bayrem previously approved these changes May 19, 2026
Comment thread providers/search/connectors/tavily.py Fixed
…zation header

The file had two docstrings concatenated without a closing triple-quote, causing
a syntax error that failed ruff/mypy. Also had duplicate search() and extract()
method definitions from the merge. Moved api_key from request body to
Authorization Bearer header (addresses GitHub Advanced Security flag).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bayrem bayrem merged commit 96c8314 into main May 19, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants