fix(session): reconcile late acks, serialize concurrent reactions, harden hook + ack error paths#348
Closed
rmyndharis wants to merge 1 commit into
Closed
fix(session): reconcile late acks, serialize concurrent reactions, harden hook + ack error paths#348rmyndharis wants to merge 1 commit into
rmyndharis wants to merge 1 commit into
Conversation
…rden hook + ack error paths - A fast delivered/read/failed ack could match 0 rows when it arrived before the send's 2nd save committed waMessageId, leaving the message stuck at SENT. onMessageAck now retries the forward-only, session-scoped status advance once after a short delay (ACK_RECONCILE_DELAY_MS) to close that race. - Two reactions on the same message both read the same stored snapshot and the last full-row save won, dropping a reaction. Reaction writes are now serialized per (sessionId, waMessageId) via a self-cleaning promise-chain map, preserving every reaction. - The ack status UPDATE was the one message handler without a .catch, so a connection-level rejection escaped to the global unhandled-rejection backstop without per-message context; it now logs locally. - A plugin hook returning both mutated data and an error had the (possibly partial/corrupted) data applied before the error was thrown, propagating it to persistence/webhooks/WS as success. The data mutation of an errored hook is now discarded.
Merged
rmyndharis
added a commit
that referenced
this pull request
Jun 19, 2026
* fix(webhook): deliver session lifecycle events and key webhook hardening (#335) * fix(security): pin outbound webhook and media fetches to validated IP (#338) * fix(plugins): persist plugin enable/config and restore (#339) * fix(message): persist bulk-sent messages, sanitize SSRF (#340) * fix(engine): return the real id for forwarded messages (#341) * fix(security): harden outbound requests, IPv6 SSRF (#344) * fix(security): secret-file perms, key pepper, allowedIps (#345) * fix(storage): bound tar imports, contain storage keys (#346) * fix(session): reconcile late acks, serialize reactions (#348) * fix(contract): webhook timeout, bounded shutdown, 501 (#350) * feat(session): force-kill a stuck session (#352) * merge #343 * merge #351 * chore(release): v0.4.3 — CHANGELOG, version bump (package.json/dashboard/swagger), README + docs
Owner
Author
|
Shipped in v0.4.3 (integrated via the release PR #354 and tagged |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Four small reliability fixes on the engine→session event path (delivery status, reactions, hooks). No wire-format change.
1. A delivery/read receipt that arrives before the send is recorded is no longer lost
A send stores the row optimistically (
PENDING, nowaMessageId) and only writeswaMessageId+SENTin a second save after the engine returns. The ack callbacks fire from independent engine event handlers, not gated on that send promise — so a fastdelivered/read/failedack could run the statusUPDATEwhile the row'swaMessageIdwas still null, matching 0 rows. Each ack is one-shot, so the message stayed stuck at "sent" forever.onMessageAcknow retries the forward-only, session-scoped advance once after a short delay, reconciling the status. The transition guard is unchanged, so the retry can't downgrade a higher status.2. Concurrent reactions on the same message no longer overwrite each other
onMessageReactiondid a read-modify-write of the singlemetadata.reactionsJSON column with anawaitbetween the read and the full-row save. Two reactions on the same message interleaved across that await: both read the same snapshot, both saved the whole map, last writer won — silently dropping a reaction. Reaction writes are now serialized per(sessionId, waMessageId)through a small promise-chain map that cleans up its own entries (no leak), so every reaction is applied on top of the previous one.3. The ack DB write no longer escapes to the global unhandled-rejection backstop
The ack status
UPDATEwas the onlymessage.*handler without a.catch— a connection-level rejection (pool exhaustion,SQLITE_BUSY) surfaced through the global backstop with no per-message context. It now logs locally like its sibling handlers.4. A plugin hook that reports an error no longer has its partial output applied
HookResultallows returningdataanderrortogether. The pipeline appliedcurrentData = result.databefore checkingresult.error, so a handler that mutated the payload and signalled failure had its (possibly half-transformed) data propagated to persistence, webhooks, and the WebSocket as if it succeeded. The data assignment is now gated onresult.error === undefined— an errored handler's mutation is discarded (on both the continue and the stop path). Error isolation (catch-and-continue) is unchanged.Tests
New unit tests: ack retry fires once on
affected:0and not when a row is advanced (fake timers); a rejected ack update produces no unhandled rejection; concurrent reactions from two senders both survive (DB-aliasing simulated with per-read snapshots); the reaction remove/delete branch; a failed reaction write doesn't block a later one; the serialization map is emptied after draining; the hook gate discards an errored mutation on both the continue andcontinue:falsepaths. Full backend 737/737; dashboard build + lint clean.Deferred (not in this PR)
Persisting phone-composed self-messages (sends typed on the linked phone) into the local
messagestable is intentionally left out: it requires aUNIQUE(sessionId, waMessageId)index migration, a boot-time dedup of legacy rows, and an upsert across two write paths (a naive insert double-persists API sends). That's a higher-risk change better done as its own PR with the migration.