Skip to content

Phase 3: server-enforced pedagogy state machine + new session/hint tools#38

Merged
SPerekrestova merged 7 commits into
devin/1778098342-phase-0-hygienefrom
devin/1778178452-phase-3-pedagogy
May 7, 2026
Merged

Phase 3: server-enforced pedagogy state machine + new session/hint tools#38
SPerekrestova merged 7 commits into
devin/1778098342-phase-0-hygienefrom
devin/1778178452-phase-3-pedagogy

Conversation

@devin-ai-integration
Copy link
Copy Markdown

Summary

Phase 3 of the redesign plan. Pedagogy moves from honor-system prompts to a server-enforced state machine: a per-problem session JSON tracks hintLevel (0–4), attempts, lastLocalRunPassed, status. list_problem_solutions and get_problem_solution are gated at the wire — they reject with HINT_LEVEL_TOO_LOW until the active session has reached the maximum level. The MCP instructions field replaces the prompt-based "remember to invoke X first" dance with rules clients receive once at handshake.

Stacked on top of #37; base auto-rebases to main once #37 reaches main.

Domain layer (src/domain/, one commit, pure logic + tests):

  • hint-state-machine.tsadvanceHint / resetSession / assertSolutionUnlocked. No IO.
  • pedagogy.tsgenerateHint(problem, level, userCode?). Level 1 clarification → 2 approach → 3 implementation sketch → 4 solution unlock. userCode parameter is reserved for Phase 5 (workspace awareness) so the contract is stable.
  • session-store.tsFileSessionStore: one JSON per slug under ~/.leetcode-mcp/sessions/, mode 0o600, with malformed-file recovery and slug validation (rejects anything not matching ^[a-z0-9-]+$ to prevent path traversal).
  • 23 unit tests under tests/domain/ cover every transition, gate, hint projection, and store edge case.

Application layer (src/domain/session-service.ts, one commit):

  • SessionService is the single seam tools depend on. startOrResume is idempotent — re-running on a slug the user is already mid-way through preserves hint progress. assertSolutionUnlocked is the gate.

Wire layer (four commits):

  • src/mcp/server-instructions.ts — the instructions string. Plain exported constant so it's unit-testable and easy to evolve.
  • src/mcp/tools/session-tools.ts — four new tools: start_problem, request_hint, get_session_state, reset_session. Errors flow through a shared errorEnvelope that surfaces the structured code field alongside the human message.
  • src/mcp/tools/solution-tools.ts — gates list_problem_solutions (already takes questionSlug) and adds a required titleSlug parameter to get_problem_solution so the gate has session context (a topicId alone doesn't tell us which session to verify).
  • src/index.tsMcpServer constructor receives ServerOptions with instructions set. SessionService is constructed once per server lifetime and shared across the new and gated tools.

Tests (one commit, additive):

  • tests/integration/solution-tools-integration.test.ts rewritten to inject a SessionService over a per-test temp dir; adds three gate cases (no session, below unlock, at unlock) for each tool. 8 tests, all pass.
  • tests/e2e/lifecycle.test.ts — expected-tools list extended with the four new tools.
  • tests/e2e/pedagogy-gate.test.ts (new) — drives a real server through start_problemrequest_hint × 4 with a mocked GraphQL fixture, asserts list_problem_solutions rejects with HINT_LEVEL_TOO_LOW at every pre-unlock level and only opens at level 4. Second case: reset_session clamps back to 0 and re-engages the gate.

Tests: unit/integration 152 → 178 (+26 — 23 domain + 3 gate); e2e 7 → 9 (+2 gate cases). npm run build clean; npm run test:types clean; npm run format clean; full npm run test:all exits 0.

Why not amend behavior beyond gating + new tools:

  • Existing tools (get_problem, submit_solution, etc.) keep their current signatures — Phase 3 deliberately ships only the state machine and the gate. Wiring submit_solution to the session for attempts tracking lands in Phase 6.
  • get_started is kept as-is; the redesign plan called for it to be deprecated alongside the instructions field rollout, but I'd rather observe that the instructions field is honored by the clients you care about before turning get_started into a stub. Trivial follow-up if you want it.
  • learning-prompts.ts survives unchanged for the same reason — clients on older MCP versions that don't honor instructions still see the prompts.

Breaking change (additive): get_problem_solution now requires titleSlug in addition to topicId. Existing clients calling without titleSlug will see a clear "required" Zod error.

Review & Testing Checklist for Human

  • Spot-check src/domain/hint-state-machine.ts and assertSolutionUnlocked — this is the gate and any bug here defeats the whole pedagogy contract
  • Confirm the instructions field copy in src/mcp/server-instructions.ts reads the way you want — it's the single source of truth the client will read at handshake
  • Verify the slug regex in src/domain/session-store.ts (^[a-z0-9-]+$) accepts every slug shape LeetCode actually uses (I checked a sample but you have a longer history with it)
  • Run npm run test:all locally to confirm e2e specs pass on your platform (Linux Node 22 in CI)

Notes

Link to Devin session: https://app.devin.ai/sessions/d003a60939484686b2953ae32fe2794d
Requested by: @SPerekrestova

Pure-logic foundation for the server-enforced tutoring contract — no
server wiring yet, no behaviour change for existing tools.

- src/types/session.ts:        SessionState + HintLevel + SessionStatus
- src/types/errors.ts:         add HINT_LEVEL_TOO_LOW + SESSION_NOT_FOUND
- src/domain/hint-state-machine.ts:
                               advanceHint / resetSession / assertSolutionUnlocked
                               (the gate that solution-returning tools will call)
- src/domain/pedagogy.ts:      generateHint(problem, level, userCode?) projecting
                               problem-derived hint text per level (1..4).
                               userCode parameter is reserved for Phase 5
                               (workspace awareness) so the contract is stable.
- src/domain/session-store.ts: FileSessionStore — one JSON per slug under
                               ~/.leetcode-mcp/sessions, mode 0o600, with
                               malformed-file recovery and slug validation.

23 unit tests cover the level transitions, the gate, the hint projections,
and the session round-trip / path-traversal / malformed-file paths.
…ate machine + hint generator

Tools should depend on SessionService, not on FileSessionStore /
hint-state-machine / pedagogy directly — it's the seam that makes the
gate uniform across solution-returning tools and gives the wire layer
a single object to wire up.

- startOrResume(slug, language?): idempotent — re-running on a slug
  the user is already mid-way through preserves hint progress.
- get(slug): null when no start_problem call.
- advance(slug, problem): bumps level + persists + returns generated
  hint text; throws SESSION_NOT_FOUND when called without start_problem.
- reset(slug): zeroes level / attempts / lastLocalRunPassed.
- assertSolutionUnlocked(slug): the gate that solution tools call.

Pure domain types (no IO) move into FileSessionStore via constructor
injection so tests can pass an in-memory or fixture-backed store.
…s with MCP instructions field

MCP prompts are opt-in; "agent must remember to call leetcode_learning_mode"
is precisely the kind of instruction-following LLMs reliably fail at.

The MCP `instructions` field, supported on McpServer's ServerOptions
since the SDK shipped MCP 2025-06-18 support, is delivered to clients at
handshake regardless of whether the agent ever asks for prompts. The
single source of truth for the pedagogy contract now lives there and is
read once per session.

Kept as a plain exported constant so it's easy to unit-test
independently and to evolve as later phases land (workspace
awareness, runners, strict-mode submission gating).
…tate / reset_session

Four new MCP tools that drive the pedagogy state machine.

- start_problem(titleSlug, language?): idempotent — opens (or resumes)
  a tutoring session. Must be called before request_hint, list_problem_solutions,
  or get_problem_solution. Re-running on a slug the user is already
  mid-way through preserves their hint progress.

- request_hint(titleSlug): advances the hint level by 1 and returns
  generated text. Levels: 1 clarification → 2 approach → 3 implementation
  sketch → 4 solution unlock.

- get_session_state(titleSlug): returns the persisted session for a
  problem, or null if start_problem was never called for it. Useful for
  restoring context after a restart.

- reset_session(titleSlug): clears the session back to level 0 with
  attempts/lastLocalRunPassed zeroed. Lifecycle status reset to 'started'.

Errors flow through a shared errorEnvelope that surfaces the structured
`code` field (HINT_LEVEL_TOO_LOW / SESSION_NOT_FOUND / etc) alongside
the human message, so clients can dispatch on it. Re-exported so the
solution tools can render the same shape when their gate trips.
…nt level

Both community-solution tools now reject with HINT_LEVEL_TOO_LOW until
the active session for the slug has reached the maximum hint level.

- list_problem_solutions: gates on the questionSlug parameter that
  is already required.
- get_problem_solution: adds a required `titleSlug` parameter (the
  topicId alone doesn't tell us which session to gate on). New
  parameter, additive: existing clients calling without titleSlug see
  a clear "required" error. Tool descriptions explicitly call out the
  GATED status so any agent reading them knows the contract.

Errors render through the shared errorEnvelope so a missing or
underleveled session surfaces as the structured code, not a free-form
text `error: ...`.
…strap

- McpServer constructor now receives ServerOptions with `instructions`
  set to the canonical pedagogy contract — delivered at handshake.
- SessionService is constructed once per server lifetime and shared
  between the new session tools and the gated solution tools.
- registerSessionTools and registerSolutionTools take the service via
  constructor injection, so unit/integration tests can supply a store
  backed by a per-test temp dir.
- tests/integration/solution-tools-integration.test.ts: rewritten to
  inject a SessionService over a per-test temp dir. Adds three gate
  cases (no session, session below unlock, session at unlock) for both
  list_problem_solutions and get_problem_solution. 8 tests, all pass.

- tests/e2e/lifecycle.test.ts: extends the expected-tools list to
  include start_problem, request_hint, get_session_state, reset_session.
  Catches drift between the registered tool set and the wire surface.

- tests/e2e/pedagogy-gate.test.ts: end-to-end regression for the gate.
  Drives a real server through start_problem → request_hint × 4 with a
  mocked GraphQL fixture, asserts list_problem_solutions rejects with
  HINT_LEVEL_TOO_LOW at every pre-unlock level and only opens up at
  level 4. Second case: reset_session clamps back to 0 and re-engages
  the gate. Locks the contract end-to-end so a future refactor that
  drops the gate fails CI before clients do.

178/178 unit+integration; 11/11 e2e (was 9; +2 gate cases).
@devin-ai-integration
Copy link
Copy Markdown
Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Copy link
Copy Markdown
Owner

LGTM


Generated by Claude Code

@devin-ai-integration
Copy link
Copy Markdown
Author

Thanks for the LGTM. Continuing autonomously to Phase 4a (Python local runner) on a separate branch — will open the next PR shortly.

@SPerekrestova SPerekrestova merged commit 6b5f02b into devin/1778098342-phase-0-hygiene May 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant