fix: guard finally-block state reset against replacement queries (lr-8355)#301
Conversation
…queries (lr-8355) processQueryStream's finally block unconditionally reset pendingPermissions, pendingAskUser, pendingElicitations, and isProcessing even when a newer query had already taken over the session (e.g. rewind_execute nulls queryInstance, then the user immediately sends a new message, starting a fresh query while the old stream's for-await is still unwinding). The old stream's finally then wiped the new query's in-flight permission entry — the SDK sat blocked awaiting that resolver, the permission card vanished, and approval clicks were silently ignored. isProcessing was also flipped false mid-stream for the live query. Guard the whole reset block with the same session.queryInstance === myQueryInstance || session.queryInstance === null check already used two lines above for handle/abortController cleanup. Regression test drives the exact race: start query A, simulate a rewind that swaps in query B with its own pending permission, then unblock A's stream so its finally runs while B is live. Asserts B's permission entry and isProcessing survive, and that normal (non-superseded) completion still resets state as before.
There was a problem hiding this comment.
PEACHES — 1 nit
lib/sdk-bridge.js:1034 (guard condition) — amos.code-craft.1 — The ownership guard condition session.queryInstance === myQueryInstance || session.queryInstance === null is correctly applied to all four state-reset fields (pendingPermissions, pendingAskUser, pendingElicitations, isProcessing). The guard matches the spec in lr-8355 exactly and prevents a replacement query's in-flight state from being wiped by an unwinding query's finally block.
test/sdk-bridge-replacement-query-lr-8355.test.js:170 — peaches.nit.observation — Test line 170 checks if (session.streamPromiseA) which does not exist; session.streamPromise is overwritten when query B starts (line 158). The condition is always false, so the await at line 170 is skipped. The test still passes because subsequent setImmediate() calls (lines 172–173) yield to the event loop, allowing the finally block to run. However, the test would be more explicit if it saved the promise from query A before starting query B: e.g., var streamPromiseA = session.streamPromise; after line 143, then await it at line 170 to guarantee the finally block has completed before assertions.
Verdict:
- Ownership guard is correctly scoped and condition matches spec.
- All four state-reset fields are protected.
- No code-craft violations detected.
- Test passes and covers the race scenario, though promise tracking could be more explicit.
Review status: clean.
There was a problem hiding this comment.
BOBBIE — clean
Reviewed lib/sdk-bridge.js finally-block ownership guard (lr-8355) and the new test/sdk-bridge-replacement-query-lr-8355.test.js regression test.
Security-relevant question addressed directly: does the guard introduce a path where a permission is auto-resolved, dropped, or a stale resolver is reused across queries in a way that bypasses an approval gate?
- No auto-resolve: the reset (session.pendingPermissions = {}, etc.) never calls .resolve()/.reject() on pending entries — it only replaces the map reference. Permission resolvers are per-request closures set up in handleCanUseTool (lib/sdk-bridge.js:696-706); the finally block never touches them directly.
- No cross-query resolver reuse: myQueryInstance is captured fresh at function entry (lib/sdk-bridge.js:824), before any await, giving each processQueryStream invocation an immutable per-call identity. The new guard at line 1034 (if (session.queryInstance === myQueryInstance || session.queryInstance === null)) mirrors the identical, already-shipped ownership check at line 971 used for queryInstance.close() cleanup — this PR extends a proven pattern, it does not introduce a new one.
- Fails closed, not open: prior to this fix, an old query's finally could wipe a newer query's pendingPermissions/pendingAskUser/pendingElicitations map out from under it, causing legitimate approval clicks to reference a vanished request (a hang). The fix is strictly more conservative — it skips the reset instead of skipping a check, so no scenario introduces auto-approval.
- MCP-mode pendingAskUser preservation logic (mode === "mcp" filter, lines 1043-1047) is unchanged from prior behavior, just now gated by the same ownership check.
Scanners:
- gitleaks: no leaks found on commit 49f3a68.
- semgrep --config=auto on lib/sdk-bridge.js: 5 unsafe-formatstring hits, all on console.error/console.log lines outside the diff hunk (897, 904, 1643, 1706, 1720) and pre-existing — not introduced by this PR, and console.error format-string forging is not applicable to Node.
- osv-scanner on package-lock.json: 46 known vulnerabilities in existing dependencies, but package.json/package-lock.json are untouched by this diff — none introduced by this PR, out of scope per bobbie.dep.1/2.
No .crew/bobbie.yaml project rule patterns (vapid/authToken/sk-*) matched.
early_exit_reason: guard-only change extending an already-shipped ownership-check pattern; no reachable sink, no new dependency, no secret literal; regression test verifies the race condition directly.
scanners_run: gitleaks (ok, no findings), semgrep (ok, 5 pre-existing findings outside diff, not counted), osv-scanner (ok, 46 pre-existing findings outside diff, not counted)
{"reviewer": "bobbie", "review_status": "clean", "head_sha": "49f3a68587b0b3a138e38fe7b262c29236d377b1", "pr_number": 301}
clagentic gate-note — authorized
Authorize rationale: pre-checks passed; NAOMI authorized merge |
What
lib/sdk-bridge.js processQueryStream's finally block (~:1021-1041) unconditionally reset session.pendingPermissions, filtered pendingAskUser, reset pendingElicitations, and flipped isProcessing=false -- even when a NEWER query already owned the session. The existing queryInstance===myQueryInstance guard at :971 only covered handle/abortController cleanup, not this state-reset block.
Why
rewind_execute (project-sessions.js:874-965) nulls session.queryInstance and starts a fresh query when the user immediately sends a new message. If the OLD query's stream (for-await loop) is still unwinding in the background at that point, its finally block ran anyway and wiped the NEW query's pendingPermissions entry out from under it. The SDK sat blocked awaiting that resolver, the permission card disappeared client-side, and approval clicks were silently ignored -- the operator-visible 'approve does nothing' / hung-turn bug (lr-8355). isProcessing was also flipped false mid-stream for the still-live new query, causing UI/drain/reaper desync.
Fix
Wrapped the reset block (pendingPermissions, pendingAskUser filtering, pendingElicitations, isProcessing) in the same ownership check already used for handle/abortController cleanup two lines above: only run it when session.queryInstance equals myQueryInstance OR session.queryInstance is null. When a newer query has taken over, skip the reset entirely and let that query's own finally block manage session state when it completes.
Task
lr-8355
Tests
npm test: 610/610 passing (608 pre-existing + 2 new regression tests for lr-8355). One pre-existing test (daemon exits EX_CONFIG ... CLAGENTIC_HOME) is intermittently flaky on unmodified main as well (confirmed by running the full suite against main before this change) -- unrelated to sdk-bridge.js, not touched by this diff.
New test file: test/sdk-bridge-replacement-query-lr-8355.test.js -- drives the exact race (start query A, simulate rewind swapping in query B with its own pending permission, unblock A's stream so its finally runs while B is live) and asserts B's permission entry and isProcessing survive. A second test confirms normal (non-superseded) completion still resets state as before.
No new dependencies. No Anthropic SDK usage introduced.