Skip to content

Bug-hunt: fix correctness/UX/security issues across DM, voice, MLS, storage#12

Merged
ForeverInLaw merged 28 commits into
mainfrom
fix/bug-hunt
Jun 17, 2026
Merged

Bug-hunt: fix correctness/UX/security issues across DM, voice, MLS, storage#12
ForeverInLaw merged 28 commits into
mainfrom
fix/bug-hunt

Conversation

@ForeverInLaw

Copy link
Copy Markdown
Contributor

Bug-hunt sweep across the DM, voice-call, MLS/group, storage, and media layers. 25 atomic commits, each with tests (or a documented rationale where a fix was a false positive / won't-fix). Found via a structured hunt; every shipped fix is covered by a unit test or, for lifecycle/concurrency fixes, by reasoning + typecheck.

Voice call (crypto + playback)

  • Stop AES-GCM nonce reuse; guard overlapping poll drains and setup/teardown ref races.
  • Reject out-of-range frame seq instead of silently masking it.
  • Gap-aware decoder timestamps (derive from seq) so dropped frames aren't decoded as contiguous.
  • Bound playback scheduling latency after a stall instead of letting it grow unbounded.
  • Ringtone: stop oscillators and close the AudioContext as a backstop (leak fix).
  • Make the recorder stop idempotent; classify a call as "missed" unless it reached Active.

Private DM / messaging

  • Monotonic per-session message-id generator; wired into DM/group/channel (kills len()-based id collisions).
  • Keep a just-created chat target across a stale in-flight snapshot poll (no more getting yanked out of a chat you just opened).
  • Reset the offered set when the active conversation changes.
  • Count unread by fingerprint rather than display name.

Groups / MLS

  • Keep draining on a bad frame; dedup re-applied gossip commits.
  • Surface snapshot corruption as a Result instead of silently emptying storage.

Invites / security

  • Require an anchored fp= prefix and validate the fingerprint (TS + Rust).

Storage

  • ciphertext-store: skip unparseable history lines instead of bricking the whole history.
  • secure-storage: cache the native store only on success.

Media / a11y / diagnostics

  • Saturate Range-header arithmetic to avoid overflow.
  • Exclude aria-hidden/inert subtrees from the focus trap.
  • Serialize nested detail values in diagnostics.

Tests / infra

  • New fake Web Audio / WebCodecs harness to unit-test playback paths.
  • Lock the jitter-buffer monotonic-skip invariant with a regression test.

Verification

Rust cargo test: 92 passing · TS vitest: 107 passing · tsc --noEmit: clean.

Known not-fixed (tracked in BUGS-TODO.md): global message-queue atomicity (#7), send-stuck-Pending on crash (#8), admin-handoff loss (#18), MLS commit reorder-resync (#3 carry-over) — each needs a fault-injection / multi-node harness larger than the fix and is gated on the symptom actually appearing.

Introduce MessageIdGen ({sent_at_ms}-{seq}) so message ids no longer derive
from messages.len(), which stays flat when upsert replaces a message and lets
two same-millisecond stamps collide. Cell-based so the stamping path stays &self.
Add CallState::end_kind so both the local and remote CallEnd paths label by
phase, not by (duration==0 && reason=="no_answer"). A peer hangup of an
unanswered call no longer logs as a completed 0s call.
…orage

from_snapshot now returns Result; a truncated/tampered MLS snapshot errors
rather than restoring an empty store that masquerades as a fresh session.
Also document add_peer as 2-party-only (it discards the commit).
drain_inbound now logs and skips a frame that fails to decrypt instead of
aborting the whole drain (which also failed the caller). Track processed
commits so the joiner's own admission commit and gossip duplicates are no-ops.
Switch message ids to the monotonic generator.
Remote CallEnd now classifies via CallState::end_kind; message ids come from
the per-session generator instead of messages.len().
Drop the messages.len()-based id in favor of the per-session generator.
resolve_request_end uses saturating_add so a hostile Range (bytes=0-<u64::MAX>)
can't panic in debug or wrap to an empty window in release.
A transient keychain failure at boot is no longer memoized for the whole
process; cache_on_success retries until use_native_store succeeds.
One torn JSONL record after a crash no longer fails the entire conversation
read; bad lines are logged and skipped.
Parse the fragment via a strict fp= prefix and require 32 hex chars, matching
the group path; a bare or malformed fragment is rejected.
Snapshot+increment the seq synchronously before the async seal so two in-flight
frames can't share a nonce. Add an in-flight guard so overlapping 20ms poll
drains can't reorder frames, and check cancelled after each setup await so a
torn-down effect doesn't leak handles or clobber a newer run. Extract the drain
loop into call-drain for unit testing.
sealFrame throws when seq exceeds the 63-bit value space rather than masking it
down and silently reusing a low seq's nonce.
stop() returns the in-flight promise so the auto-stop timer and a manual stop
can't both call recorder.stop() and trigger InvalidStateError.
Channels/groups dedupe own messages by fingerprint so a same-named peer still
counts and a renamed self does not. Extract notificationBody for the toast text.
…anges

A fingerprint offered in one channel/group no longer stays marked as offered in
another.
Parse the hash with URLSearchParams and require the fp key; a bare #hex or a
stray fp= no longer slips through as a valid fingerprint.
Filter focusable elements by closest('[aria-hidden=true],[inert]') so Tab can't
land on controls hidden via an ancestor.
compactDetail JSON-stringifies object/array values instead of rendering
[object Object].
jsdom ships no Web Audio or WebCodecs, so tests stub these globals with
inspectable doubles (context, oscillator, gain, buffer source, decoder).
… backstop

Schedule an oscillator stop after the ring pattern and close the AudioContext
on ended, so a missed stop() can no longer leak the context for the session.
…ap-aware

Resync scheduling to the clock when it drifts past 200ms after a stall instead
of growing latency for the rest of the call. Thread the frame seq into
pushFrame and derive the decoder timestamp from it (masking the direction bit)
so frames the jitter buffer dropped aren't labelled contiguous.
nextActiveTarget force-selected sessions[0] whenever the current target was
not in the latest snapshot, so an in-flight 1s poll that returned before a
just-created session appeared would yank the user out of the chat they had
just opened.

Track every target id ever observed (seenRef) and pass it to nextActiveTarget.
A non-null current that was never seen is kept (freshly created, poll has not
caught up); it only auto-switches away once the target was seen and is now
gone (real delete), or when current is null.
The follow-up snapshot poll scheduled in refresh's finally block had no
cleanup, so it would run on an unmounted hook (wasted gateway calls + setState
on a dead component). refreshRef was also assigned during render.

Store the timeout id and clear it in the effect cleanup; assign refreshRef
inside the effect instead of during render.
Bump version and document the bug-hunt fixes in the changelog.
@ForeverInLaw ForeverInLaw merged commit b1f032b into main Jun 17, 2026
2 checks passed
@ForeverInLaw ForeverInLaw deleted the fix/bug-hunt branch June 17, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant