Wormhole is an MCP control plane for coding agents working in large repositories. It gives agents repo indexes, symbol context, impact analysis, verification guidance, evidence records, resume checkpoints, and advisory gates. It is not an autonomous coding agent.
The 0.18 line is a proof-and-hardening release. New claims should point to files under benchmarks/baselines/ or benchmarks/results/.
0.18.0 ships on pilot-scoped evidence by explicit decision (2026-07-03): the hardening implementation, full dogfood (0 failures), full test suite, and rc tarball are done, and the eval gate is closed at pilot scope. The full N=3 two-repo matrix is deferred to 0.19. Benchmark claims below are baseline measurements plus one 12-run pilot — not a general success-rate claim. See docs/planning/eval-readout-2026-07.md for the full readout, decision rules, and threats to validity.
- Default core surface: 76 tools in guided/full mode, 45,151 bytes, about 11,288 tokens. Source:
benchmarks/baselines/tool-surface-cost.json. - Layered core surface: 24 visible tools, 14,844 bytes, about 3,711 tokens. Source:
benchmarks/baselines/tool-surface-cost.json. - Full all-packs surface remains available by opt-in: 250 tools, 142,187 bytes, about 35,547 tokens. Source:
benchmarks/baselines/tool-surface-cost-all-packs.json. - Layered all-packs surface keeps the startup schema small: 25 visible tools, 15,221 bytes, about 3,806 tokens. Source:
benchmarks/baselines/tool-surface-cost-all-packs.json. - Large-repo baseline against VS Code: 14,147 files, 368,616 symbols, 499,585 edges, 1,586,548,736 byte SQLite DB, 442,462 ms build time, 84,111 ms measured query latency. Source:
benchmarks/baselines/large-repo-index-vscode.json. - Self reachability baseline records the required pre-deletion audit run for this repo and keeps all removal findings advisory. Source:
benchmarks/baselines/repo-reachability-self.json. - Pilot eval (12 runs, vscode @
d70651f2239c, N=1 per cell, proxy tokens): wormhole-layered 2/4 task success vs 1/4 for both unaided and wormhole-full, with ~25% fewer tokens and ~25% less wall clock than unaided. Two of four task-type checks were later found defective and fixed after the run; see the caveat indocs/planning/eval-readout-2026-07.md. Source:benchmarks/results/2026-07-03T21-15-00Z-recomputed/. - Source checkouts contain the Phase 2 eval harness at
benchmarks/phase2/phase2-eval.tsand the internal runnable plan atbenchmarks/phase2/plan.json. The npm package does not ship that internal plan; it ships only redacted Phase 2 public evidence atbenchmarks/phase2/public-plan.jsonplus the baseline JSON files. The full N=3 two-repo matrix has not run; Wormhole claims pilot-scoped evidence only, not a general real-repo success-rate or token-efficiency win.
Run the server after publishing or from an npm-installed package:
npx wormhole-mcpUse layered exposure when startup token cost matters:
WORMHOLE_EXPOSURE=layered npx wormhole-mcpPowerShell:
$env:WORMHOLE_EXPOSURE = "layered"; npx wormhole-mcpFor MCP clients on Windows, prefer the JSON env form below.
Claude Code MCP config:
claude mcp add-json wormhole '{"type":"stdio","command":"npx","args":["-y","wormhole-mcp"],"env":{"WORMHOLE_EXPOSURE":"layered"}}'Project .mcp.json:
{
"mcpServers": {
"wormhole": {
"type": "stdio",
"command": "npx",
"args": ["-y", "wormhole-mcp"],
"env": {
"WORMHOLE_EXPOSURE": "layered"
}
}
}
}First retrieval call:
{ "repoRoot": "/absolute/path/to/repo", "query": "startup entrypoint", "limit": 5 }Exposure controls schema visibility:
WORMHOLE_EXPOSURE=layered: registers the small visible surface plustool_invoke.WORMHOLE_EXPOSURE=guided: registers the active load unit with guidance-oriented descriptions.WORMHOLE_EXPOSURE=full: registers the active load unit without layered hiding.
Pack loading controls which tools exist in the active load unit:
- unset or
WORMHOLE_PACKS=core: repo index/query/symbol/impact/verify/gate/resume core. WORMHOLE_PACKS=all: all current tools.WORMHOLE_PACKS=discovery,verification: core plus the named packs.
In layered mode, hidden active tools can be called through tool_invoke. Tools from disabled packs are not invokable through that dispatcher.
- Build or refresh repo facts with
repo_index_build,durable_repo_index_refresh, ordurable_index_manifest_refresh. - Query with
repo_index_query,durable_repo_index_query,repo_intelligence_search, orsymbol_context. - Narrow blast radius with
blast_radius_analyze,change_impact_analyze, andtest_impact_analyze_v2. - Run verification planning and checks with
test_plan_select,verification_run, diagnostics tools, and lifecycle reviews. - Record evidence and ask
gate_requestbefore final claims. - Use
resume_record,resume_checkpoint,resume_validate, andresume_loadfor cross-chat continuation.
- Tools and fields with legacy
semanticnames currently perform lexical token-overlap retrieval unless a response says otherwise withmatchKind. symbol_contextcan use live LSP only for caller-configured TypeScript language servers; repo graph facts still work without LSP.- Wormhole gates bind cooperating agents and Wormhole-owned write tools. Host-side
EditandWriteenforcement requires host support; seedocs/examples/claude-code-pretooluse-wormhole.md. - Python is optional at startup. It is required only for Python-backed sidecar jobs such as graph metrics, graph communities, media extraction, trace summaries, and offline policy evaluation.
- Tree-sitter grammars are optional dependencies. Regex fallback keeps install and indexing usable when native grammar builds fail.
These are source-checkout commands, not npm-package commands. The npm package ships the cited baseline JSON files and the redacted Phase 2 public plan for inspection, but not the internal runnable Phase 2 plan, raw results, external repos, or workspaces. The TypeScript benchmark and eval runners require a source checkout with dev dependencies installed.
Measure schema cost:
npm run tool_surface_cost
npx tsx scripts/tool-surface-cost.ts --modes=full,guided,layered --packs=allValidate baseline files:
npm run benchmarks:validateDry-run the Phase 2 eval matrix:
npm run phase2:dry-runRun the actual eval only after configuring the model command arms for the target environment:
npx tsx scripts/run-phase2-eval.ts --confirm-full-runRequired environment variables for a full eval run:
WORMHOLE_PHASE2_UNAIDED_COMMANDWORMHOLE_PHASE2_LAYERED_COMMANDWORMHOLE_PHASE2_FULL_COMMANDWORMHOLE_PHASE2_PRIVATE_REPO
Requires Node.js 22.5.0 or newer.
npm install
npm run typecheck
npm test
npm run build
npm run dogfood:selftest:smoke
npm run package:smokeOptional Python sidecar dependencies:
python -m pip install -r python/requirements.txtThe package metadata is prepared as wormhole-mcp, with a wormhole-mcp bin and npm file whitelist. Local package smoke and npm pack are release-prep checks only. Publishing still requires an authenticated npm publish step; do not treat the npx wormhole-mcp path as externally available until that succeeds.