Skip to content

siskomx/wormhole

Repository files navigation

Wormhole

Wormhole is an MCP control plane for coding agents working in large repositories. It gives agents repo indexes, symbol context, impact analysis, verification guidance, evidence records, resume checkpoints, and advisory gates. It is not an autonomous coding agent.

The 0.18 line is a proof-and-hardening release. New claims should point to files under benchmarks/baselines/ or benchmarks/results/.

Release Status

0.18.0 ships on pilot-scoped evidence by explicit decision (2026-07-03): the hardening implementation, full dogfood (0 failures), full test suite, and rc tarball are done, and the eval gate is closed at pilot scope. The full N=3 two-repo matrix is deferred to 0.19. Benchmark claims below are baseline measurements plus one 12-run pilot — not a general success-rate claim. See docs/planning/eval-readout-2026-07.md for the full readout, decision rules, and threats to validity.

Current Evidence

  • Default core surface: 76 tools in guided/full mode, 45,151 bytes, about 11,288 tokens. Source: benchmarks/baselines/tool-surface-cost.json.
  • Layered core surface: 24 visible tools, 14,844 bytes, about 3,711 tokens. Source: benchmarks/baselines/tool-surface-cost.json.
  • Full all-packs surface remains available by opt-in: 250 tools, 142,187 bytes, about 35,547 tokens. Source: benchmarks/baselines/tool-surface-cost-all-packs.json.
  • Layered all-packs surface keeps the startup schema small: 25 visible tools, 15,221 bytes, about 3,806 tokens. Source: benchmarks/baselines/tool-surface-cost-all-packs.json.
  • Large-repo baseline against VS Code: 14,147 files, 368,616 symbols, 499,585 edges, 1,586,548,736 byte SQLite DB, 442,462 ms build time, 84,111 ms measured query latency. Source: benchmarks/baselines/large-repo-index-vscode.json.
  • Self reachability baseline records the required pre-deletion audit run for this repo and keeps all removal findings advisory. Source: benchmarks/baselines/repo-reachability-self.json.
  • Pilot eval (12 runs, vscode @ d70651f2239c, N=1 per cell, proxy tokens): wormhole-layered 2/4 task success vs 1/4 for both unaided and wormhole-full, with ~25% fewer tokens and ~25% less wall clock than unaided. Two of four task-type checks were later found defective and fixed after the run; see the caveat in docs/planning/eval-readout-2026-07.md. Source: benchmarks/results/2026-07-03T21-15-00Z-recomputed/.
  • Source checkouts contain the Phase 2 eval harness at benchmarks/phase2/phase2-eval.ts and the internal runnable plan at benchmarks/phase2/plan.json. The npm package does not ship that internal plan; it ships only redacted Phase 2 public evidence at benchmarks/phase2/public-plan.json plus the baseline JSON files. The full N=3 two-repo matrix has not run; Wormhole claims pilot-scoped evidence only, not a general real-repo success-rate or token-efficiency win.

Quickstart

Run the server after publishing or from an npm-installed package:

npx wormhole-mcp

Use layered exposure when startup token cost matters:

WORMHOLE_EXPOSURE=layered npx wormhole-mcp

PowerShell:

$env:WORMHOLE_EXPOSURE = "layered"; npx wormhole-mcp

For MCP clients on Windows, prefer the JSON env form below.

Claude Code MCP config:

claude mcp add-json wormhole '{"type":"stdio","command":"npx","args":["-y","wormhole-mcp"],"env":{"WORMHOLE_EXPOSURE":"layered"}}'

Project .mcp.json:

{
  "mcpServers": {
    "wormhole": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "wormhole-mcp"],
      "env": {
        "WORMHOLE_EXPOSURE": "layered"
      }
    }
  }
}

First retrieval call:

{ "repoRoot": "/absolute/path/to/repo", "query": "startup entrypoint", "limit": 5 }

Tool Loading

Exposure controls schema visibility:

  • WORMHOLE_EXPOSURE=layered: registers the small visible surface plus tool_invoke.
  • WORMHOLE_EXPOSURE=guided: registers the active load unit with guidance-oriented descriptions.
  • WORMHOLE_EXPOSURE=full: registers the active load unit without layered hiding.

Pack loading controls which tools exist in the active load unit:

  • unset or WORMHOLE_PACKS=core: repo index/query/symbol/impact/verify/gate/resume core.
  • WORMHOLE_PACKS=all: all current tools.
  • WORMHOLE_PACKS=discovery,verification: core plus the named packs.

In layered mode, hidden active tools can be called through tool_invoke. Tools from disabled packs are not invokable through that dispatcher.

Core Workflow

  1. Build or refresh repo facts with repo_index_build, durable_repo_index_refresh, or durable_index_manifest_refresh.
  2. Query with repo_index_query, durable_repo_index_query, repo_intelligence_search, or symbol_context.
  3. Narrow blast radius with blast_radius_analyze, change_impact_analyze, and test_impact_analyze_v2.
  4. Run verification planning and checks with test_plan_select, verification_run, diagnostics tools, and lifecycle reviews.
  5. Record evidence and ask gate_request before final claims.
  6. Use resume_record, resume_checkpoint, resume_validate, and resume_load for cross-chat continuation.

Honesty Notes

  • Tools and fields with legacy semantic names currently perform lexical token-overlap retrieval unless a response says otherwise with matchKind.
  • symbol_context can use live LSP only for caller-configured TypeScript language servers; repo graph facts still work without LSP.
  • Wormhole gates bind cooperating agents and Wormhole-owned write tools. Host-side Edit and Write enforcement requires host support; see docs/examples/claude-code-pretooluse-wormhole.md.
  • Python is optional at startup. It is required only for Python-backed sidecar jobs such as graph metrics, graph communities, media extraction, trace summaries, and offline policy evaluation.
  • Tree-sitter grammars are optional dependencies. Regex fallback keeps install and indexing usable when native grammar builds fail.

Benchmarks

These are source-checkout commands, not npm-package commands. The npm package ships the cited baseline JSON files and the redacted Phase 2 public plan for inspection, but not the internal runnable Phase 2 plan, raw results, external repos, or workspaces. The TypeScript benchmark and eval runners require a source checkout with dev dependencies installed.

Measure schema cost:

npm run tool_surface_cost
npx tsx scripts/tool-surface-cost.ts --modes=full,guided,layered --packs=all

Validate baseline files:

npm run benchmarks:validate

Dry-run the Phase 2 eval matrix:

npm run phase2:dry-run

Run the actual eval only after configuring the model command arms for the target environment:

npx tsx scripts/run-phase2-eval.ts --confirm-full-run

Required environment variables for a full eval run:

  • WORMHOLE_PHASE2_UNAIDED_COMMAND
  • WORMHOLE_PHASE2_LAYERED_COMMAND
  • WORMHOLE_PHASE2_FULL_COMMAND
  • WORMHOLE_PHASE2_PRIVATE_REPO

Local Development

Requires Node.js 22.5.0 or newer.

npm install
npm run typecheck
npm test
npm run build
npm run dogfood:selftest:smoke
npm run package:smoke

Optional Python sidecar dependencies:

python -m pip install -r python/requirements.txt

Distribution Status

The package metadata is prepared as wormhole-mcp, with a wormhole-mcp bin and npm file whitelist. Local package smoke and npm pack are release-prep checks only. Publishing still requires an authenticated npm publish step; do not treat the npx wormhole-mcp path as externally available until that succeeds.

About

Evidence-aware MCP operating layer for AI coding agents.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages