fix(notifier): resolve waitWebSocket promise on WS error to prevent pull-loop wedge#72
Merged
Merged
Conversation
…ull-loop wedge Previously the RxJS subject's error handler only logged on subject.error(), leaving the pull-loop awaiting the promise until the 20s timeout. With a persistent WS disconnect every cycle went through the timeout-only path with no events flowing — observed as a per-source pump wedge in data-pathways production on 2026-05-10. Three fixes: (1) error handler resolves the promise so loop re-enters wait() within milliseconds; (2) promise + eventResolver bound before connect() so a synchronous error during connect doesn't race against an unbound resolver; (3) try/finally guarantees disconnect() runs on every exit path, fixing a socket leak. Adds _internals.createNotificationClient test seam so the WS subject lifecycle can be driven deterministically in unit tests. Refs: data-pathways outage bddefdd2-c377-472a-bc50-cd75f708f822 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0fe6aa8 to
87762d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
waitWebSocket()promise immediately on RxJS subjecterror()instead of waiting up to 20 s for the timeout, eliminating the per-source pull-loop wedge observed indata-pathwaysproduction on 2026-05-10.eventResolverbeforenotificationClient.connect()so a synchronous error during connect cannot race against an unbound resolver.awaitintry/finallysonotificationClient.disconnect()runs on every exit path (error, timeout, success, abort) — fixes a pre-existing socket leak on the error path._internals.createNotificationClienttest seam so the WS subject lifecycle can be driven deterministically in unit tests. Not part of the public API.Why
Production incident 2026-05-10: PAT events written to flowcore stopped propagating to Usable. Root cause located in
src/data-pump/notifier.ts:107-140: subject'serrorhandler only logged, so the awaited promise never resolved on WS disconnect. Loop spun on 20 s timeouts with no events flowing. Full investigation in Outage fragmentbddefdd2-c377-472a-bc50-cd75f708f822.Test plan
deno fmt --checkdeno lintdeno check src/mod.tsdeno test -A test/tests/notifier.test.ts— 8 cases, 0 failuresdeno test -A— full suite, 10 passed (47 steps), 0 failedTests cover
wait()resolves the promise within 50 ms (regression for the wedge)wait()callwait()delivers events normallydisconnect()runs on every exit path (error, timeout, success, abort)next()fires on a terminated subjectconnect()resolves:wait()resolves cleanly🤖 Generated with Claude Code