Skip to content

feat(linkedin): implement connector with unofficial API + MCP browser fallback#83

Merged
bayrem merged 10 commits into
mainfrom
fix/graph-routing-and-search-tools
Jun 2, 2026
Merged

feat(linkedin): implement connector with unofficial API + MCP browser fallback#83
bayrem merged 10 commits into
mainfrom
fix/graph-routing-and-search-tools

Conversation

@bjridicodes

Copy link
Copy Markdown
Collaborator

Summary

  • Implements the LinkedInConnector stub with a two-layer approach: unofficial linkedin-api library as primary, stickerdaniel/linkedin-mcp-server (Patchright browser automation) as internal fallback
  • MCP server cloned to mcp_servers/linkedin-mcp-server/ (gitignored), started as subprocess via asyncio.run() + MCP Python SDK when the primary raises any exception
  • Connector enabled in config/config.yaml with max_concurrent: 1 to reduce ban risk
  • 19 unit tests covering credentials guard, recency suffix stripping, field mapping, fallback trigger, and MCP result parsing

Setup required before running

# One-time browser login for MCP fallback
cd mcp_servers/linkedin-mcp-server && uv run -m linkedin_mcp_server --login

Add to Infisical (dev): LINKEDIN_EMAIL, LINKEDIN_PASSWORD

Test plan

  • pytest tests/test_linkedin_connector.py — 19 tests pass
  • Smoke test primary: infisical run --env=dev -- python -c "from providers.search.connectors.linkedin import LinkedInConnector; c = LinkedInConnector({}); print(c.search('Product Manager Paris', max_results=3))"
  • Full pipeline run: infisical run --env=dev -- python run.py
  • MCP fallback: temporarily use wrong credentials to confirm fallback activates and logs correctly

🤖 Generated with Claude Code

bjridicodes and others added 10 commits May 19, 2026 22:05
Two bugs found during the first full pipeline run after the search milestone:

1. generate_queries self-loop: _needs_generate_queries checked state["raw_queries"]
   but the node writes state["queries"], so the router always saw an empty list and
   looped. Replaced the conditional self-edge with a direct edge to search_jobs.
   Also fixed the cache-hit path which read from state["raw_queries"] (always [])
   instead of the queries file.

2. anthropic_web search returned nothing: allow_tools: false in config was applied
   globally, including to the search LLM. The Claude CLI needs
   --dangerously-skip-permissions to invoke web-search tools. Factory now
   overrides allow_tools=True when task="search".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the redundant "→" link column; the run ID cell now carries the href
so the table is one column narrower and every run is still one click away.
Fixed the test that checked for the old trailing link cell pattern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_node_row_html computed total_tokens = in + out, dropping cache_read and
cache_creation. This caused the pipeline execution table to show lower
numbers than the grand total, making the two sections appear inconsistent.
Now matches _usage_row_html which already counted all four buckets.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
build_llm(cfg["llm"]) without a task arg defaults to task="default" which
resolves allow_tools=False. Company searches with url: hints invoke the
Claude CLI and need --dangerously-skip-permissions to browse the web —
same fix as search_jobs already had.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Grand total line now shows '≈Xk effective compute' in green when cache
tokens are present (formula: new_in + out + 0.1×cache_read), making it
clear that high cache-read is efficient, not wasteful.

Per-node pipeline table replaces the single token total with 'Xin / Yout'
and adds '/ Zcached' (green) when cache-read tokens are present for that
node. Same change applied to the live-page JS so the live view is consistent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ctive

Three connected fixes to improve search volume and quality:

1. url_validator: drop known aggregator/listing-page URL patterns before
   hitting Tavily (builtin.com, hnhiring, arc.dev listing pages, etc.).
   These pass URL validation because Tavily can fetch them, but they're
   search-result category pages, not individual postings — scoring rejects
   them at near-100% rate, wasting Tavily extract quota.

2. search_jobs: raise _DIRECTIVE_TARGET 30→50 and _DIRECTIVE_LLM_MAX
   50→80 to target 30-50 validated individual postings per run.

3. web_search SEARCH_DIRECTIVE: instruct the LLM to search each major
   job board (WTTJ, LinkedIn /jobs/view, Lever, Greenhouse, Ashby,
   Workday) with dedicated queries rather than relying on broad web
   results. Explicitly forbid listing/search pages in the FORBIDDEN
   block so the LLM understands what counts as an individual posting.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Prompts extracted from web_search.py now live in:
  query/SEARCH_DIRECTIVE_PROMPT.md  — directive (jobs search, URL candidates)
  query/SEARCH_COMPANY_PROMPT.md    — company single-query search

Edit these files to tune search behaviour without touching Python code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… fallback

Primary path uses linkedin-api (unofficial voyager API, email/password auth).
Falls back transparently to stickerdaniel/linkedin-mcp-server (Patchright browser)
on any primary failure. MCP server installed under mcp_servers/ and ignored by git.

- providers/search/connectors/linkedin.py: full implementation replacing stub
- requirements.txt: add linkedin-api>=2.3.1, mcp>=1.0.0
- config/config.yaml: enable linkedin connector, max_concurrent: 1
- agent/nodes/search_jobs.py: add linkedin to _DEFAULT_MAX_CONCURRENT
- tests/test_linkedin_connector.py: 19 unit tests, all passing
- .gitignore: exclude mcp_servers/ (third-party local installs)

One-time setup required: cd mcp_servers/linkedin-mcp-server && uv run -m linkedin_mcp_server --login
Secrets needed in Infisical: LINKEDIN_EMAIL, LINKEDIN_PASSWORD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mypy cannot infer the Linkedin type from a lazy import inside a method,
so it types self._client as None and flags .search_jobs() as attr-defined.
The ignore is safe: the assignment on the line above guarantees non-None.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bayrem bayrem merged commit 83c6644 into main Jun 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants