Skip to content

Phase 2: real e2e harness — spawn build/index.js + nock + isolated HOME#37

Merged
SPerekrestova merged 6 commits into
devin/1778098342-phase-0-hygienefrom
devin/1778177573-phase-2-e2e-harness
May 7, 2026
Merged

Phase 2: real e2e harness — spawn build/index.js + nock + isolated HOME#37
SPerekrestova merged 6 commits into
devin/1778098342-phase-0-hygienefrom
devin/1778177573-phase-2-e2e-harness

Conversation

@devin-ai-integration
Copy link
Copy Markdown

Summary

Phase 2 of the redesign plan — replace the Phase 0 placeholder spec with a real end-to-end suite that exercises the MCP server as a black box: each spec spawns the built binary (build/index.js) as a child process over stdio, attaches the SDK's StdioClientTransport, and drives the server exactly like a real client would. Stacked on top of #36; base auto-rebases to main once #36 reaches main.

Harness pieces (one commit):

  • tests/e2e/harness/preload.mjs — runs inside the spawned child via NODE_OPTIONS=--import … before any user code. Activates nock with disableNetConnect() so the child can never accidentally reach the real leetcode.com, then reads a JSON fixture from process.env.E2E_FIXTURE_PATH and installs interceptors that replay canned GraphQL / REST responses.
  • tests/e2e/harness/spawn-server.tsspawnServer({ fixture?, home?, env? }). mkdtemps a fresh HOME per call (so ~/.leetcode-mcp/credentials.json is per-test), writes the fixture to a temp file, spawns node build/index.js with the preload + isolated env, and returns a connected MCP Client. Specs that need to pre-seed credentials can pass their own home.
  • tests/e2e/harness/global-setup.ts — vitest globalSetup hook that rebuilds build/index.js if src/ is newer. Without this, editing source and running test:e2e would silently exercise a stale binary.
  • tests/e2e/harness/types.ts — shared E2EFixture type, dependency-free so both vitest specs and the lightweight preload script can import it.
  • vitest.e2e.config.ts — dedicated config (only includes tests/e2e/**, 30s test timeout, wires globalSetup).
  • package.jsontest:e2e uses the new config; default test script now excludes tests/e2e/** so unit/integration runs stay fast (~13s vs ~17s with e2e). test:all chains test + test:e2e.
  • New devDep: nock@^14.0.15.

Specs (three commits, 7 tests total):

  • lifecycle.test.ts (4 tests) — server name/version are non-empty after handshake; the registered tools / prompts / resource templates match the expected set. Locks the wire-level surface area: any drift in tool names or registry shape now fails CI before clients do.
  • auth-restore.test.ts (2 tests) — silent-logout-on-restart regression. Spawns a real server with a pre-seeded ~/.leetcode-mcp/credentials.json and a mocked userStatus GraphQL response, then calls check_auth_status over stdio and asserts authenticated: true, username: "alice". Also asserts the negative path (no credentials file → authenticated: false). If the Phase 1 fix regresses this fails.
  • problem-flow.test.ts (1 test) — happy path. Mocks the leetcode-query question(titleSlug:…) GraphQL operation, calls get_problem, asserts the wire-level envelope shape ({ titleSlug, problem } with topicTags projected to string[]).

Drops: the Phase 0 placeholder.test.ts (no longer needed now that the directory has real specs).

Tests: unit/integration 152/152 (was 153 with placeholder); e2e 7/7. npm run build clean; npm run test:types clean; npm run format clean.

Why this design vs. the alternatives:

  • nock runs inside the child via the preload script rather than the parent test process — the parent can't intercept HTTP from a separately-spawned process, so this is the only way to exercise the real binary while still serving canned responses.
  • globalSetup runs npm run build (only when src/ is newer) rather than relying on the developer to remember. ~1s up front beats a silent stale-binary class of bug.
  • Dedicated vitest.e2e.config.ts keeps the slow spawn-the-binary specs out of the default test run.

Review & Testing Checklist for Human

  • Verify npm run test:e2e passes locally (it builds the binary if needed, then spawns three child specs)
  • Verify the lifecycle expected-tools assertion still matches what's actually registered if you've added or renamed any tools
  • Spot-check tests/e2e/harness/preload.mjs — particularly the times === undefined → scope.persist() branch, since nock 14's .persist() lives on the Scope (not the Interceptor) and that's a subtle one to get right
  • Confirm the --import ESM preload path works on your platform (tested on Linux Node 22; macOS / Windows untested in CI)

Notes

  • This PR is stacked on top of Phase 1: types, zod runtime validation, structured errors, auth restore bugfix #36 (Phase 1). Once Phase 1: types, zod runtime validation, structured errors, auth restore bugfix #36 reaches main, the base will auto-rebase and this PR's diff will shrink to just the four Phase 2 commits.
  • Phase 2 deliberately ships only three specs to keep the PR scope focused on the harness. Future PRs (or smaller follow-ups) can add coverage for submit_solution, save_leetcode_credentials, resource reads, etc. — adding a spec is now ~10 lines of fixture + test.
  • nock 14 brought one transitive moderate-severity warning (@hono/node-server/hono chain) per npm audit; not blocking and unrelated to runtime code.
  • Defaulted the still-open questions from earlier rounds to the lowest-risk choice: strict mode off, workspace cwd/${slug}.${ext}, instructions field with get_started kept as deprecated stub. Each is reversible with a single env var or follow-up PR.

Link to Devin session: https://app.devin.ai/sessions/d003a60939484686b2953ae32fe2794d
Requested by: @SPerekrestova

Adds the infrastructure for spawning the real MCP server binary as a
child process, attaching the SDK's StdioClientTransport, and serving
LeetCode HTTP from a JSON fixture via a nock-activating preload script.

- nock devDep (^14.0.15)
- tests/e2e/harness/preload.mjs   activates nock + replays fixture
- tests/e2e/harness/spawn-server.ts  mkdtemp HOME + StdioClientTransport
- tests/e2e/harness/global-setup.ts  rebuilds build/index.js if stale
- tests/e2e/harness/types.ts      shared fixture types (preload + tests)
- vitest.e2e.config.ts            dedicated config + globalSetup wiring
- package.json                    test:e2e uses the new config; the
                                  default 'test' script now excludes
                                  tests/e2e so unit/integration runs
                                  stay fast
Locks in the wire-level surface area: server name/version are non-empty
after the MCP handshake, and the registered tools / prompts / resource
templates match the expected set. Any drift in tool names or registry
shape now fails CI before clients do.
Spawns a real server with a pre-seeded ~/.leetcode-mcp/credentials.json
and a mocked userStatus GraphQL response, then calls check_auth_status
over stdio. Fails if the Phase 1 fix regresses (i.e., the on-disk creds
are read but never pushed into the in-memory Credential).

Also asserts the negative path: with no credentials file, the tool
reports authenticated=false.
Spawns the server, mocks the leetcode-query 'question(titleSlug:…)'
GraphQL operation, and asserts get_problem returns the expected
{ titleSlug, problem } envelope with a topicTags-as-string[] projection.

Also rewrites tests/e2e/README.md as the harness's actual user-facing
docs (how it mocks HTTP, how to author a fixture, why the default 'test'
script excludes tests/e2e), and removes the Phase 0 placeholder spec
now that the directory has real specs.
@devin-ai-integration
Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Owner

Code review

Overall this is a thoughtful, well-documented harness. Two concrete bugs, both in the harness itself:

Bug 1 — Stale-binary check defeats its own purpose

tests/e2e/harness/global-setup.ts (lines 30-41)

needsRebuild only stats src/index.ts vs build/index.js:

const [binStat, srcStat] = await Promise.all([
    stat("build/index.js"),
    stat("src/index.ts")
]);
return binStat.mtimeMs < srcStat.mtimeMs;

The file's docstring says "ensures … build/index.js … is at least as fresh as src/", but in practice editing src/services/leetcode.ts, src/tools/*.ts, src/types/*.ts, etc. won't bump src/index.ts's mtime, so needsRebuild returns false and the e2e suite silently exercises a stale binary — the very class of bug this hook is supposed to prevent. Walk src/ (or all **/*.ts under it) and compare the max mtime, or just always rebuild and let tsc's incremental cache make it cheap.

Bug 2 — pathToImportUrl doesn't percent-encode

tests/e2e/harness/spawn-server.ts (lines 128-130)

function pathToImportUrl(absolutePath: string): string {
    return new URL(`file://${absolutePath}`).href;
}

This is the wrong primitive — new URL("file://" + path) does not percent-encode spaces or other URL-reserved characters. If PRELOAD resolves through a directory containing a space (~/My Projects/..., /Users/Some Name/..., common on macOS), --import file:///Users/Some Name/.../preload.mjs will fail or be parsed incorrectly. The standard library has the right tool, and fileURLToPath is already imported in this file — just import its sibling:

import { fileURLToPath, pathToFileURL } from "node:url";
...
return pathToFileURL(absolutePath).href;

Smaller drawback (worth flagging, not blocking)

spawn-server.ts lines 74-82 — fixture file leaks into caller-owned HOME. When options.home is supplied, the harness still writes fixture.json into that directory but never deletes it (cleanup is a no-op for caller-provided homes). It happens to work in auth-restore.test.ts because the test rm's the whole dir, but the contract "harness only cleans up homes it created" is silently violated for fixture.json. Either write the fixture to its own temp file independent of HOME, or document explicitly that callers providing home get a fixture.json byproduct.

Everything else (the nock preload's per-entry closure capture, process.exit(1) on malformed fixtures, env minimization, transport cleanup ordering) looks correct. The Windows portability caveats (HOME vs USERPROFILE, npm.cmd via execFile) are already acknowledged in the PR body, so not flagging those.


Generated by Claude Code

Comparing build/index.js mtime against only src/index.ts let edits to
any other module slip past the freshness check, exercising a stale
binary against the new specs — the very class of bug the hook was
supposed to prevent.

Now recursively walks src/, takes the max mtime of every .ts file, and
rebuilds when the binary lags behind any of them.

Verified manually: `touch src/utils/logger.ts && npm run test:e2e`
triggers a full rebuild (~3s) and bumps the binary mtime above the
touched source file.
…er HOME

Two fixes in spawn-server.ts surfaced by review:

1. `new URL(`file://${path}`)` does not percent-encode path segments,
   so `--import file:///Users/Some Name/.../preload.mjs` would fail when
   the harness lived under a path containing spaces or other URL-reserved
   characters (common on macOS user dirs). Switched to
   `pathToFileURL(PRELOAD).href` which is the documented primitive.

2. When a caller supplied `options.home`, the harness still wrote
   `fixture.json` into that directory but cleanup only removed homes
   the harness created. Specs that pre-seed credentials silently got an
   extra `fixture.json` byproduct. Now the fixture lives in its own
   `mkdtemp` directory regardless of who owns HOME, and the harness
   always cleans it up.
@devin-ai-integration
Copy link
Copy Markdown
Author

Both bugs and the smaller drawback are real — fixed in bb37ace (stale-binary) and 1f5eafd (preload URL + fixture isolation). CI green; e2e 7/7.

Bug 1 — stale-binary check (tests/e2e/harness/global-setup.ts)
You're right, the docstring lied. needsRebuild now walks src/ recursively and takes the max mtime over every .ts file under it, so edits anywhere in the tree (src/leetcode/*.ts, src/types/*.ts, etc.) trigger a rebuild. Verified manually: touch src/utils/logger.ts && npm run test:e2e rebuilds and bumps build/index.js's mtime past the touched file.

Bug 2 — pathToImportUrl doesn't percent-encode (tests/e2e/harness/spawn-server.ts)
Switched to pathToFileURL(PRELOAD).href. pathToFileURL is already exported from the same node:url module that the file imports for fileURLToPath, so it's a one-line dependency add and the documented primitive for exactly this conversion.

Smaller drawback — fixture leaks into caller-owned HOME (tests/e2e/harness/spawn-server.ts)
Agreed. The harness now mkdtemps a separate fixture directory regardless of who owns HOME, and cleanup() removes it unconditionally. So a caller supplying home no longer gets a stray fixture.json byproduct, and the "harness only cleans up resources it created" contract holds again.

@SPerekrestova SPerekrestova merged commit f66913d into devin/1778098342-phase-0-hygiene May 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant