Skip to content

Devin/1778098342 phase 0 hygiene#44

Open
SPerekrestova wants to merge 26 commits into
mainfrom
devin/1778098342-phase-0-hygiene
Open

Devin/1778098342 phase 0 hygiene#44
SPerekrestova wants to merge 26 commits into
mainfrom
devin/1778098342-phase-0-hygiene

Conversation

@SPerekrestova
Copy link
Copy Markdown
Owner

No description provided.

SPerekrestova and others added 26 commits May 6, 2026 23:56
- problem.ts: Problem, SimplifiedProblem, ProblemSummary, ProblemSearchResult, DailyChallenge, CodeSnippet, TopicTag, SimilarQuestion
- user.ts: UserStatus, UserProfile, UserContestInfo, UserAllSubmissions, UserRecentSubmissions, UserRecentACSubmissions, UserSubmissionDetail, UserProgressQuestionList, SubmissionRow
- solution.ts: SolutionArticleSummary, SolutionArticleList, SolutionArticleDetail
- errors.ts: ErrorCode discriminated union, LeetCodeError class, isLeetCodeError type guard
- index.ts: re-exports

Types describe the shapes returned by LeetcodeServiceInterface methods —
the projected envelopes the service emits, not the raw upstream GraphQL
payloads. Existing src/types/credentials.ts and src/types/submission.ts
unchanged (their shapes already match the interface).

No behavior change; no consumers wired up yet.
src/leetcode/schemas.ts exports zod validators for the four
LeetCode response shapes the server actually parses today:

- SubmitResponseSchema: response from POST /problems/<slug>/submit/
- CheckResponseSchema: response from GET /submissions/detail/<id>/check/
- QuestionIdResponseSchema: response from the questionId GraphQL query
- ValidateCredentialsResponseSchema: response from the userStatus GraphQL query

Each schema uses .passthrough() so unexpected fields don't fail the
parse — only missing required fields do. CheckResponse is intentionally
loose on status_msg, code_answer, and expected_answer because LeetCode
omits / changes their shape between PENDING and SUCCESS states.

z.infer<> types are exported alongside each schema for use in the
service impl. No behavior change; schemas not yet consumed.
LeetcodeServiceInterface:
- Replace every Promise<any> with concrete typed returns drawn from
  src/types/{problem,user,solution,submission}
- Add SolutionArticlesQueryOptions for fetchQuestionSolutionArticles
- Method signatures unchanged in name/arity; only return types tightened

LeetCodeGlobalService:
- Implement all methods against the new return types
- Replace stringly-typed throws with LeetCodeError(ErrorCode.AUTH_REQUIRED, …)
  on every authenticated path; LeetCodeError(ErrorCode.PROBLEM_NOT_FOUND, …)
  on missing problems
- Validate the four LeetCode response shapes via zod at the wire boundary:
  submitSolution() now safeParses both SubmitResponse and CheckResponse and
  throws LeetCodeError(ErrorCode.UPSTREAM_PAYLOAD_INVALID, …) on schema
  failure with the zod issue list attached as .cause
- getQuestionId() validates QuestionIdResponseSchema and surfaces a typed
  error instead of returning a stringified shape
- validateCredentials() validates ValidateCredentialsResponseSchema; on
  schema failure logs a warning and returns null (preserves the previous
  null-on-failure contract for the auth path)
- Project nullable upstream fields (runtime_percentile, total_correct, etc.)
  into the SubmissionResult shape via ?? undefined so consumers see
  number | undefined, not number | null | undefined

This commit removes 15+ Promise<any> from the interface and ~20 'as any'
casts from the implementation. fetchSolutionArticleDetail() now returns
SolutionArticleDetail | null (LeetCode actually returns null for unknown
topicIds; previous 'as ... | undefined' was a lie).
…n-memory

Fixes the silent-logout-on-restart bug from the assessment: the server
loaded ~/.leetcode-mcp/credentials.json from disk, validated them, and
told the user they were 'authenticated' — but never actually pushed the
cookies into the in-memory Credential the LeetCode client reads from.
Result: every authenticated tool call after a restart failed with
'Authentication required' until the user re-pasted their cookies.

Three pieces:

- src/auth/auth-flow.ts (new): restoreCredentials(service, storage) loads
  the persisted credentials, calls service.validateCredentials() to
  confirm they're still good, and on success calls
  service.updateCredentials() to push them into the running service.
  Returns a typed RestoreOutcome ('no_credentials' | 'invalid' |
  'restored') for logging. Never throws.

- src/index.ts: invoke restoreCredentials(leetcodeService) at startup
  before tools/prompts are registered. Best-effort: any failure leaves
  the server unauthenticated as before.

- src/mcp/tools/auth-tools.ts: in check_auth_status, when validation
  succeeds, also call leetcodeService.updateCredentials() so the very
  next authenticated tool call works without forcing a server restart.
- tests/auth/auth-flow.test.ts (new, 5 tests): exercises restoreCredentials
  with a mocked service+storage across the four outcome paths
  (no_credentials / load_failed / expired / restored) plus the
  validate-throws path
- tests/mcp/tools/auth-tools.test.ts: extend the existing
  check_auth_status happy-path test to assert
  service.updateCredentials() is called (regression guard for the
  silent-logout bug)
- tests/services/{problem,solution}-services.test.ts: tighten access
  patterns now that return types are concrete (optional chaining where
  upstream fields can legitimately be missing). solution test asserts
  toBeNull() for the invalid-topicId case to match
  fetchSolutionArticleDetail's actual return.

Test count: 146 → 151 (5 new auth-flow tests). All 151 pass; test:types
passes; build clean; npm audit reports 0.
Drive-by formatting fix from running 'npm run format' during Phase 1.
No behavior change.
Address review comments on PR #36:

- fetchUserAllSubmissions: drop `| null` from interface lang/status so
  it matches the impl. No callers ever pass null.
- fetchUserProgressQuestionList: tighten impl to require
  `filters: { offset: number; limit: number; ... }` matching the interface.
  The single caller (get_user_progress_questions tool) zod-defaults
  offset/limit, so they're always passed.
- fetchProblem: now throws LeetCodeError(PROBLEM_NOT_FOUND) on missing
  upstream rather than returning a typed-but-actually-null Problem.
  Removes the contradictory shape where the impl returned null but the
  signature claimed `Promise<Problem>`. fetchProblemSimplified no longer
  needs its own null-check.
- UserStatus.username and .avatar are now `string | null` (was `string`
  defaulted to ""). With `isSignedIn: false` and `username: ""` consumers
  couldn't distinguish 'signed out' from 'signed in, no display name'.

No behavior change for happy paths; error paths now use the structured
LeetCodeError instead of a stringified misshape.
Addresses review on PR #36: redeclaring 'public readonly cause?: unknown'
on LeetCodeError shadowed the native ES2022 Error.cause field via the
class-field installer (under useDefineForClassFields). Anything walking
the standard chain via `err.cause` saw the right thing only by accident.

Use `super(message, { cause })` and drop the field — same external
API, but err.cause now refers to the actual native chain so loggers and
debuggers that expect ES2022 semantics work consistently.
…ry/catch

Addresses review on PR #36:

- Extract validate->updateCredentials sequence into a small reusable
  helper applyValidatedCredentials(service, csrf, session) so both
  restoreCredentials() and check_auth_status share one implementation.
- Drop the try/catch in restoreCredentials that mapped any throw to
  {status: 'invalid', reason: 'expired'}. The interface contract for
  validateCredentials is Promise<string | null>; the catch was dead code
  against any conforming impl. If a future impl violates the contract by
  throwing, surfacing the error is more useful than silently swallowing
  it.
- Update auth-flow tests: replace the dead 'swallows the error when
  validate throws' test with one asserting that exceptions propagate;
  add unit tests for the new applyValidatedCredentials helper.

Test count: 153 (was 151). All passing.
Phase 1: types, zod runtime validation, structured errors, auth restore bugfix
Adds the infrastructure for spawning the real MCP server binary as a
child process, attaching the SDK's StdioClientTransport, and serving
LeetCode HTTP from a JSON fixture via a nock-activating preload script.

- nock devDep (^14.0.15)
- tests/e2e/harness/preload.mjs   activates nock + replays fixture
- tests/e2e/harness/spawn-server.ts  mkdtemp HOME + StdioClientTransport
- tests/e2e/harness/global-setup.ts  rebuilds build/index.js if stale
- tests/e2e/harness/types.ts      shared fixture types (preload + tests)
- vitest.e2e.config.ts            dedicated config + globalSetup wiring
- package.json                    test:e2e uses the new config; the
                                  default 'test' script now excludes
                                  tests/e2e so unit/integration runs
                                  stay fast
Locks in the wire-level surface area: server name/version are non-empty
after the MCP handshake, and the registered tools / prompts / resource
templates match the expected set. Any drift in tool names or registry
shape now fails CI before clients do.
Spawns a real server with a pre-seeded ~/.leetcode-mcp/credentials.json
and a mocked userStatus GraphQL response, then calls check_auth_status
over stdio. Fails if the Phase 1 fix regresses (i.e., the on-disk creds
are read but never pushed into the in-memory Credential).

Also asserts the negative path: with no credentials file, the tool
reports authenticated=false.
Spawns the server, mocks the leetcode-query 'question(titleSlug:…)'
GraphQL operation, and asserts get_problem returns the expected
{ titleSlug, problem } envelope with a topicTags-as-string[] projection.

Also rewrites tests/e2e/README.md as the harness's actual user-facing
docs (how it mocks HTTP, how to author a fixture, why the default 'test'
script excludes tests/e2e), and removes the Phase 0 placeholder spec
now that the directory has real specs.
Comparing build/index.js mtime against only src/index.ts let edits to
any other module slip past the freshness check, exercising a stale
binary against the new specs — the very class of bug the hook was
supposed to prevent.

Now recursively walks src/, takes the max mtime of every .ts file, and
rebuilds when the binary lags behind any of them.

Verified manually: `touch src/utils/logger.ts && npm run test:e2e`
triggers a full rebuild (~3s) and bumps the binary mtime above the
touched source file.
…er HOME

Two fixes in spawn-server.ts surfaced by review:

1. `new URL(`file://${path}`)` does not percent-encode path segments,
   so `--import file:///Users/Some Name/.../preload.mjs` would fail when
   the harness lived under a path containing spaces or other URL-reserved
   characters (common on macOS user dirs). Switched to
   `pathToFileURL(PRELOAD).href` which is the documented primitive.

2. When a caller supplied `options.home`, the harness still wrote
   `fixture.json` into that directory but cleanup only removed homes
   the harness created. Specs that pre-seed credentials silently got an
   extra `fixture.json` byproduct. Now the fixture lives in its own
   `mkdtemp` directory regardless of who owns HOME, and the harness
   always cleans it up.
…e-harness

Phase 2: real e2e harness — spawn build/index.js + nock + isolated HOME
Pure-logic foundation for the server-enforced tutoring contract — no
server wiring yet, no behaviour change for existing tools.

- src/types/session.ts:        SessionState + HintLevel + SessionStatus
- src/types/errors.ts:         add HINT_LEVEL_TOO_LOW + SESSION_NOT_FOUND
- src/domain/hint-state-machine.ts:
                               advanceHint / resetSession / assertSolutionUnlocked
                               (the gate that solution-returning tools will call)
- src/domain/pedagogy.ts:      generateHint(problem, level, userCode?) projecting
                               problem-derived hint text per level (1..4).
                               userCode parameter is reserved for Phase 5
                               (workspace awareness) so the contract is stable.
- src/domain/session-store.ts: FileSessionStore — one JSON per slug under
                               ~/.leetcode-mcp/sessions, mode 0o600, with
                               malformed-file recovery and slug validation.

23 unit tests cover the level transitions, the gate, the hint projections,
and the session round-trip / path-traversal / malformed-file paths.
…ate machine + hint generator

Tools should depend on SessionService, not on FileSessionStore /
hint-state-machine / pedagogy directly — it's the seam that makes the
gate uniform across solution-returning tools and gives the wire layer
a single object to wire up.

- startOrResume(slug, language?): idempotent — re-running on a slug
  the user is already mid-way through preserves hint progress.
- get(slug): null when no start_problem call.
- advance(slug, problem): bumps level + persists + returns generated
  hint text; throws SESSION_NOT_FOUND when called without start_problem.
- reset(slug): zeroes level / attempts / lastLocalRunPassed.
- assertSolutionUnlocked(slug): the gate that solution tools call.

Pure domain types (no IO) move into FileSessionStore via constructor
injection so tests can pass an in-memory or fixture-backed store.
…s with MCP instructions field

MCP prompts are opt-in; "agent must remember to call leetcode_learning_mode"
is precisely the kind of instruction-following LLMs reliably fail at.

The MCP `instructions` field, supported on McpServer's ServerOptions
since the SDK shipped MCP 2025-06-18 support, is delivered to clients at
handshake regardless of whether the agent ever asks for prompts. The
single source of truth for the pedagogy contract now lives there and is
read once per session.

Kept as a plain exported constant so it's easy to unit-test
independently and to evolve as later phases land (workspace
awareness, runners, strict-mode submission gating).
…tate / reset_session

Four new MCP tools that drive the pedagogy state machine.

- start_problem(titleSlug, language?): idempotent — opens (or resumes)
  a tutoring session. Must be called before request_hint, list_problem_solutions,
  or get_problem_solution. Re-running on a slug the user is already
  mid-way through preserves their hint progress.

- request_hint(titleSlug): advances the hint level by 1 and returns
  generated text. Levels: 1 clarification → 2 approach → 3 implementation
  sketch → 4 solution unlock.

- get_session_state(titleSlug): returns the persisted session for a
  problem, or null if start_problem was never called for it. Useful for
  restoring context after a restart.

- reset_session(titleSlug): clears the session back to level 0 with
  attempts/lastLocalRunPassed zeroed. Lifecycle status reset to 'started'.

Errors flow through a shared errorEnvelope that surfaces the structured
`code` field (HINT_LEVEL_TOO_LOW / SESSION_NOT_FOUND / etc) alongside
the human message, so clients can dispatch on it. Re-exported so the
solution tools can render the same shape when their gate trips.
…nt level

Both community-solution tools now reject with HINT_LEVEL_TOO_LOW until
the active session for the slug has reached the maximum hint level.

- list_problem_solutions: gates on the questionSlug parameter that
  is already required.
- get_problem_solution: adds a required `titleSlug` parameter (the
  topicId alone doesn't tell us which session to gate on). New
  parameter, additive: existing clients calling without titleSlug see
  a clear "required" error. Tool descriptions explicitly call out the
  GATED status so any agent reading them knows the contract.

Errors render through the shared errorEnvelope so a missing or
underleveled session surfaces as the structured code, not a free-form
text `error: ...`.
…strap

- McpServer constructor now receives ServerOptions with `instructions`
  set to the canonical pedagogy contract — delivered at handshake.
- SessionService is constructed once per server lifetime and shared
  between the new session tools and the gated solution tools.
- registerSessionTools and registerSolutionTools take the service via
  constructor injection, so unit/integration tests can supply a store
  backed by a per-test temp dir.
- tests/integration/solution-tools-integration.test.ts: rewritten to
  inject a SessionService over a per-test temp dir. Adds three gate
  cases (no session, session below unlock, session at unlock) for both
  list_problem_solutions and get_problem_solution. 8 tests, all pass.

- tests/e2e/lifecycle.test.ts: extends the expected-tools list to
  include start_problem, request_hint, get_session_state, reset_session.
  Catches drift between the registered tool set and the wire surface.

- tests/e2e/pedagogy-gate.test.ts: end-to-end regression for the gate.
  Drives a real server through start_problem → request_hint × 4 with a
  mocked GraphQL fixture, asserts list_problem_solutions rejects with
  HINT_LEVEL_TOO_LOW at every pre-unlock level and only opens up at
  level 4. Second case: reset_session clamps back to 0 and re-engages
  the gate. Locks the contract end-to-end so a future refactor that
  drops the gate fails CI before clients do.

178/178 unit+integration; 11/11 e2e (was 9; +2 gate cases).
…dagogy

Phase 3: server-enforced pedagogy state machine + new session/hint tools
…ict-mode gate

Merge PR #39 after review fixes and local verification.
@SPerekrestova SPerekrestova self-assigned this Jun 1, 2026
Copy link
Copy Markdown
Owner Author

@SPerekrestova SPerekrestova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed for correctness/security. Three concrete findings worth addressing — none are blockers, but #1 is a real (latent) state-machine bug and #2 is a security hardening gap.

1. recordLocalRun keeps solved across failing runssrc/domain/session-service.ts:164-165

status:
    session.status === "started" ? "attempting" : session.status,

Once a session reaches solved, a subsequent failing run_local_tests keeps status === "solved" while lastLocalRunPassed flips to false — the docstring claims the lifecycle stays "accurate", but a failed run after solving silently keeps the solved label. Today nothing in the codebase sets "solved" (only "started""attempting" happens), so this is latent — but the moment something writes "solved", this branch will lie. Suggest: when passed === false, demote from "solved" back to "attempting".

2. sandbox-exec profile escaping is incompletesrc/runner/sandbox.ts:161

(allow file-write* (subpath "${cwdAllowed.replace(/"/g, '\\"')}"))

Only " is escaped; a \ in the path will produce a broken escape or terminate the string early, breaking the profile (and theoretically allowing injection). Today cwdAllowed only ever comes from mkdtemp(join(tmpdir(), "leetcode-mcp-run-")), so the risk is theoretical — but it's a foot-gun if any future caller passes a less-controlled path. Suggest: cwdAllowed.replace(/\\/g, "\\\\").replace(/"/g, '\\"'), and reject paths containing newlines.

3. spawnAndCapture finalizes twice on error then closesrc/runner/subprocess-runner.ts:322-350
The error handler resolve(...)s and the close handler later runs finalize() which also resolve(...)s. Promise semantics make the second resolve a no-op, so no observable bug — but Buffer.concat(stdout) + clampOutput rerun on potentially large buffers, and the listeners aren't torn down. Suggest a settled flag (or detach the other listener) so the second path short-circuits.

Other things I checked and they look correct: assertSafeSlug regex, applyValidatedCredentials reference semantics, advanceHint clamping at MAX_HINT_LEVEL, env scrubbing in spawnAndCapture (only PATH/HOME/LANG forwarded), the 1MB output cap belt-and-braces, and the zod passthrough() usage.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant