Bug-hunt: fix correctness/UX/security issues across DM, voice, MLS, storage#12
Merged
Conversation
Introduce MessageIdGen ({sent_at_ms}-{seq}) so message ids no longer derive
from messages.len(), which stays flat when upsert replaces a message and lets
two same-millisecond stamps collide. Cell-based so the stamping path stays &self.
Add CallState::end_kind so both the local and remote CallEnd paths label by phase, not by (duration==0 && reason=="no_answer"). A peer hangup of an unanswered call no longer logs as a completed 0s call.
…orage from_snapshot now returns Result; a truncated/tampered MLS snapshot errors rather than restoring an empty store that masquerades as a fresh session. Also document add_peer as 2-party-only (it discards the commit).
drain_inbound now logs and skips a frame that fails to decrypt instead of aborting the whole drain (which also failed the caller). Track processed commits so the joiner's own admission commit and gossip duplicates are no-ops. Switch message ids to the monotonic generator.
Remote CallEnd now classifies via CallState::end_kind; message ids come from the per-session generator instead of messages.len().
Drop the messages.len()-based id in favor of the per-session generator.
resolve_request_end uses saturating_add so a hostile Range (bytes=0-<u64::MAX>) can't panic in debug or wrap to an empty window in release.
A transient keychain failure at boot is no longer memoized for the whole process; cache_on_success retries until use_native_store succeeds.
One torn JSONL record after a crash no longer fails the entire conversation read; bad lines are logged and skipped.
Parse the fragment via a strict fp= prefix and require 32 hex chars, matching the group path; a bare or malformed fragment is rejected.
Snapshot+increment the seq synchronously before the async seal so two in-flight frames can't share a nonce. Add an in-flight guard so overlapping 20ms poll drains can't reorder frames, and check cancelled after each setup await so a torn-down effect doesn't leak handles or clobber a newer run. Extract the drain loop into call-drain for unit testing.
sealFrame throws when seq exceeds the 63-bit value space rather than masking it down and silently reusing a low seq's nonce.
stop() returns the in-flight promise so the auto-stop timer and a manual stop can't both call recorder.stop() and trigger InvalidStateError.
Channels/groups dedupe own messages by fingerprint so a same-named peer still counts and a renamed self does not. Extract notificationBody for the toast text.
…anges A fingerprint offered in one channel/group no longer stays marked as offered in another.
Parse the hash with URLSearchParams and require the fp key; a bare #hex or a stray fp= no longer slips through as a valid fingerprint.
Filter focusable elements by closest('[aria-hidden=true],[inert]') so Tab can't
land on controls hidden via an ancestor.
compactDetail JSON-stringifies object/array values instead of rendering [object Object].
jsdom ships no Web Audio or WebCodecs, so tests stub these globals with inspectable doubles (context, oscillator, gain, buffer source, decoder).
… backstop Schedule an oscillator stop after the ring pattern and close the AudioContext on ended, so a missed stop() can no longer leak the context for the session.
…ap-aware Resync scheduling to the clock when it drifts past 200ms after a stall instead of growing latency for the rest of the call. Thread the frame seq into pushFrame and derive the decoder timestamp from it (masking the direction bit) so frames the jitter buffer dropped aren't labelled contiguous.
nextActiveTarget force-selected sessions[0] whenever the current target was not in the latest snapshot, so an in-flight 1s poll that returned before a just-created session appeared would yank the user out of the chat they had just opened. Track every target id ever observed (seenRef) and pass it to nextActiveTarget. A non-null current that was never seen is kept (freshly created, poll has not caught up); it only auto-switches away once the target was seen and is now gone (real delete), or when current is null.
The follow-up snapshot poll scheduled in refresh's finally block had no cleanup, so it would run on an unmounted hook (wasted gateway calls + setState on a dead component). refreshRef was also assigned during render. Store the timeout id and clear it in the effect cleanup; assign refreshRef inside the effect instead of during render.
Bump version and document the bug-hunt fixes in the changelog.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug-hunt sweep across the DM, voice-call, MLS/group, storage, and media layers. 25 atomic commits, each with tests (or a documented rationale where a fix was a false positive / won't-fix). Found via a structured hunt; every shipped fix is covered by a unit test or, for lifecycle/concurrency fixes, by reasoning + typecheck.
Voice call (crypto + playback)
Private DM / messaging
len()-based id collisions).Groups / MLS
Resultinstead of silently emptying storage.Invites / security
fp=prefix and validate the fingerprint (TS + Rust).Storage
Media / a11y / diagnostics
aria-hidden/inertsubtrees from the focus trap.Tests / infra
Verification
Rust
cargo test: 92 passing · TSvitest: 107 passing ·tsc --noEmit: clean.Known not-fixed (tracked in
BUGS-TODO.md): global message-queue atomicity (#7), send-stuck-Pending on crash (#8), admin-handoff loss (#18), MLS commit reorder-resync (#3 carry-over) — each needs a fault-injection / multi-node harness larger than the fix and is gated on the symptom actually appearing.