fix: destroySession race with concurrent getOrCreateSession (#106)#107
fix: destroySession race with concurrent getOrCreateSession (#106)#107
Conversation
Closes #106. destroySession awaited closeSession before calling store.deleteSession, and closeSession always persisted a fresh snapshot on its way out. During the ~0-5s child-exit await window a concurrent getOrCreateSession for the same chatId could read that just-persisted snapshot from store and spawn a --resume, reviving the session the user just tried to /clean. Delete from the store first, then close with { persist: false } so the persist step never re-adds the entry. Media cleanup still runs afterward. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes a race where /clean (destroySession) could leave a resumable snapshot in the session store while awaiting a child process exit, allowing a concurrent getOrCreateSession to revive the just-destroyed session.
Changes:
- Added a
{ persist?: boolean }option tocloseSession()and gated the final store write behind it. - Reordered
destroySession()to delete stored state before awaiting child exit, and invokedcloseSession(..., { persist: false })to avoid re-persisting. - Added a race-focused unit test that injects a slow-exiting child and asserts the store is empty mid-destroy.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
bot/src/session-manager.ts |
Reorders destroy vs. close and adds optional persistence to prevent store resurrection during shutdown await. |
bot/src/__tests__/session-manager.test.ts |
Adds a regression test validating store deletion happens before the child-exit await completes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -454,7 +454,7 @@ export class SessionManager { | |||
| } | |||
|
|
|||
| /** Close a session: persist state, SIGTERM child, clean up. */ | |||
There was a problem hiding this comment.
The JSDoc says closeSession always persists state, but the new { persist } option means it may intentionally skip persistence. Update the doc comment to reflect the optional persistence behavior (and what persist: false implies) so callers aren’t misled.
| /** Close a session: persist state, SIGTERM child, clean up. */ | |
| /** Close a session: optionally persist state, SIGTERM child, and clean up. | |
| * By default the final session state is written to the store before teardown. | |
| * Pass `{ persist: false }` to skip that write when the caller intentionally | |
| * does not want the final in-memory state persisted (for example, to avoid races). | |
| */ |
| injectDir, | ||
| }; | ||
|
|
||
| (manager as unknown as { active: Map<string, ActiveSession> }).active.set("chat-race", fakeSession); |
There was a problem hiding this comment.
This test injects a fake session directly into manager.active but doesn’t increment sessionsActive; closeSession() will call sessionsActive.dec() and can push the global gauge negative, which may leak state across tests. Consider incrementing the gauge when inserting the fake session (or resetting metrics in this suite) to keep the metric consistent.
Summary
Why
destroySession awaited closeSession, which persisted a fresh snapshot on its way out. During the 0-5s SIGTERM to exit window, a concurrent getOrCreateSession for the same chatId could read that snapshot and spawn with --resume, reviving the session the user just /cleaned. Root cause and reproduction in #106.
Test plan