Skip to content

fix(pipeline): graph routing infinite loop + search allow_tools#82

Merged
bayrem merged 8 commits into
mainfrom
fix/graph-routing-and-search-tools
May 20, 2026
Merged

fix(pipeline): graph routing infinite loop + search allow_tools#82
bayrem merged 8 commits into
mainfrom
fix/graph-routing-and-search-tools

Conversation

@bjridicodes

Copy link
Copy Markdown
Collaborator

Summary

  • Infinite loop in generate_queries: _needs_generate_queries routed based on state["raw_queries"], but the node writes state["queries"] — so the router always saw an empty list and looped back. Replaced the conditional self-edge with a direct add_edge("generate_queries", "search_jobs"). Also fixed the cache-hit path which was reading from state["raw_queries"] (always [] on first run) instead of the queries file directly.

  • anthropic_web search returned nothing: allow_tools: false in config.yaml was applied globally via the factory, including to the search LLM. The Claude CLI needs --dangerously-skip-permissions to invoke its web-search tool. build_llm() now overrides allow_tools=True when task="search", regardless of config.

Test plan

  • 205 tests pass, ruff + mypy clean
  • Full pipeline run: infisical run --env=dev -- python run.pygenerate_queries advances to search_jobs without looping; anthropic_web search returns URL candidates

🤖 Generated with Claude Code

bjridicodes and others added 8 commits May 19, 2026 22:05
Two bugs found during the first full pipeline run after the search milestone:

1. generate_queries self-loop: _needs_generate_queries checked state["raw_queries"]
   but the node writes state["queries"], so the router always saw an empty list and
   looped. Replaced the conditional self-edge with a direct edge to search_jobs.
   Also fixed the cache-hit path which read from state["raw_queries"] (always [])
   instead of the queries file.

2. anthropic_web search returned nothing: allow_tools: false in config was applied
   globally, including to the search LLM. The Claude CLI needs
   --dangerously-skip-permissions to invoke web-search tools. Factory now
   overrides allow_tools=True when task="search".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the redundant "→" link column; the run ID cell now carries the href
so the table is one column narrower and every run is still one click away.
Fixed the test that checked for the old trailing link cell pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_node_row_html computed total_tokens = in + out, dropping cache_read and
cache_creation. This caused the pipeline execution table to show lower
numbers than the grand total, making the two sections appear inconsistent.
Now matches _usage_row_html which already counted all four buckets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
build_llm(cfg["llm"]) without a task arg defaults to task="default" which
resolves allow_tools=False. Company searches with url: hints invoke the
Claude CLI and need --dangerously-skip-permissions to browse the web —
same fix as search_jobs already had.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Grand total line now shows '≈Xk effective compute' in green when cache
tokens are present (formula: new_in + out + 0.1×cache_read), making it
clear that high cache-read is efficient, not wasteful.

Per-node pipeline table replaces the single token total with 'Xin / Yout'
and adds '/ Zcached' (green) when cache-read tokens are present for that
node. Same change applied to the live-page JS so the live view is consistent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctive

Three connected fixes to improve search volume and quality:

1. url_validator: drop known aggregator/listing-page URL patterns before
   hitting Tavily (builtin.com, hnhiring, arc.dev listing pages, etc.).
   These pass URL validation because Tavily can fetch them, but they're
   search-result category pages, not individual postings — scoring rejects
   them at near-100% rate, wasting Tavily extract quota.

2. search_jobs: raise _DIRECTIVE_TARGET 30→50 and _DIRECTIVE_LLM_MAX
   50→80 to target 30-50 validated individual postings per run.

3. web_search SEARCH_DIRECTIVE: instruct the LLM to search each major
   job board (WTTJ, LinkedIn /jobs/view, Lever, Greenhouse, Ashby,
   Workday) with dedicated queries rather than relying on broad web
   results. Explicitly forbid listing/search pages in the FORBIDDEN
   block so the LLM understands what counts as an individual posting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prompts extracted from web_search.py now live in:
  query/SEARCH_DIRECTIVE_PROMPT.md  — directive (jobs search, URL candidates)
  query/SEARCH_COMPANY_PROMPT.md    — company single-query search

Edit these files to tune search behaviour without touching Python code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bayrem bayrem merged commit 872aab3 into main May 20, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants