signaling: relax announce cadence (storage + reflection do discovery now) by mrjeeves · Pull Request #15 · mrjeeves/MyOwnMesh

mrjeeves · 2026-05-27T05:39:24Z

Summary

Now that PR #14 landed (stored kind 1077 + reactive reflection on every inbound announce), the dense periodic announce schedule we inherited from the ephemeral-kind era is doing redundant work. This PR resizes the cadence to match what the new mechanisms actually need.

Before — 13 publishes per peer in the first 175 s, then one every 60 s forever:

0s, 5s, 10s, 15s, 20s, 25s, 35s, 45s, 55s, 70s, 85s, 100s, 115s, 145s, 175s, 235s, 295s, …

After — 2 publishes in the first 30 s, then one every 5 min forever:

0s, 30s, 330s, 630s, …

Roughly an 85–90 % reduction in publish volume per peer per hour.

Why this is safe

Each thing the old cadence was protecting against now has a better answer:

Old job	Now covered by
Be visible to a fresh joiner in their first few seconds	Stored kind 1077 — joiner's `REQ since=now-300s` replays our last stored announce immediately
Compensate for ephemerals being dropped	Moot — we use a stored kind
Detect a peer is gone	App-level WebRTC heartbeats on the data channel (`HEARTBEAT_INTERVAL_MS=30s`)
Wake up a steady-state peer who otherwise wouldn't reply for ~60 s	Engine reflects every inbound `PeerAnnounced` with a fresh `Announce` (rate-limited 1 s)
Be visible to a freshly-(re)connected relay	`nostr::driver::run_relay_inner` already publishes a one-shot open-announce per relay

The only remaining job is refreshing relay storage well inside the retention window. Five minutes is conservative against every public relay I'm aware of (typical retention is hours to days). The 30 s safety-net publish catches a silently-failed first publish at startup.

Test plan

cargo test --workspace — all suites green.
Manual: bring 3 peers up over the default relay pool, watch the Activity log, verify discovery latency is still snappy (<2 s) and per-peer publish rate dropped.
Manual: leave a 2-peer room idle for 10 min, confirm a 3rd joiner still finds both within ~1 s of joining (stored kind + REQ replay carries it).

https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

Generated by Claude Code

…scovery With stored kind 1077 and engine-side reactive reflection on every inbound PeerAnnounced, the dense early schedule is redundant — a late joiner sees every existing peer's last announce via REQ replay, and existing peers re-publish within ~1s of any inbound announce. Per-relay open-announce in run_relay_inner covers freshly- (re)connected relays. Collapses the post-startup curve from 13 publishes in 175 s (5s × 5, 10s × 3, 15s × 4, 30s × 2) to one safety-net publish at +30 s, and bumps steady from 60 s → 300 s. The remaining periodic publish only exists to refresh storage well inside the relay's retention window. Roughly 85–90% reduction in publish volume per peer per hour with no impact on discovery latency. https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

…tion Two follow-on connection-quality fixes layered onto the cadence relaxation. **Re-offer on stuck Sighted**: when an inbound PeerAnnounced arrives for a peer we already have a session with, but that session is still at status Sighted (PC created, data channel never opened) and we're the Offerer, re-create + re-send the SDP offer. webrtc-rs's create_offer calls set_local_description internally, kicking off a fresh ICE gathering cycle on the same PC — no teardown, no PC recreation, the remote handles the renegotiation transparently. Per- peer rate-limited via PeerStateData::last_offer_sent_at (2 s floor) so a REQ-replay burst doesn't fan out to fourteen redundant offers. Together with PR #14's reactive-announce, this is the "no network restart needed to rebuild a stuck connection" property: every announce we hear from a stuck peer prods our handshake forward, and once the channel opens the gate naturally closes. **Selected-pair classification retry**: webrtc-rs's CandidatePair stats can lag the ICE state callback — particularly on the controlling (Offerer) agent, which only flips `nominated=true` after sending USE-CANDIDATE and getting a response. The single- shot record in handle_ice_state_change therefore sometimes runs before nomination is reflected in stats, and selected_pair stays None forever even though packets are flowing — exactly the "laptop says it's not LAN even though it is" symptom on the Offerer side of a working pair. The existing ICE poller (already running every 3 s) now retries record_selected_pair for any Active/Shelved peer with no pair recorded yet. Cheap, self- healing within one poll tick. https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

CI's `cargo fmt --all --check` flagged a multi-line `format!` that rustfmt prefers to collapse. Functional no-op. https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

webrtc-rs doesn't always flip `nominated=true` on the controlling (Offerer) side, even after ICE is solidly Connected and packets are flowing. Confirmed against a working LAN pair where the laptop (Offerer) stayed unclassified while the answerer correctly painted "LAN". The ICE-poll retry from the previous commit can't recover this case — get_stats() returns nominated=false consistently, not just transiently. Pick the Succeeded pair with the largest bytes_received as the fallback. If multiple Succeeded pairs have zero bytes (briefly the case right after ICE settles), any of them classifies the same way for LAN / STUN / TURN purposes since they're all viable paths to the same peer. Nominated remains the preferred signal where the agent does set it. https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

Field-confirmed root cause of the "Offerer side classifies LAN peer as STUN" symptom: on a fast local network the remote's first trickled ICE candidate (carrying its LAN host address) routinely arrives 50–500 ms ahead of the answer SDP it's associated with. webrtc-rs's `add_ice_candidate` returns "remote description is not set" and the host candidate is silently dropped. ICE then recovers via peer-reflexive discovery from STUN binding probes, which succeeds — packets flow — but the agent's selected pair is now (Host, PeerReflexive) instead of (Host, Host), and the GUI's LAN/STUN/TURN classifier (correctly) paints it as STUN. Fix: track `remote_description_set` per peer and queue inbound ICE candidates in `pending_remote_candidates` until the first `set_remote_description` succeeds, then drain the queue. The drain happens inside `apply_remote_sdp` after the SDP is in place; the per-peer state lock is dropped before each `add_ice_candidate` await to avoid serializing the rest of the engine on a webrtc call. drop_peer naturally resets both fields because it removes the PeerConnection entry entirely — a reconnect creates a fresh one with `remote_description_set: false`. https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

claude added 5 commits May 27, 2026 05:38

fmt: rustfmt collapse of re-offer log message

82099f8

CI's `cargo fmt --all --check` flagged a multi-line `format!` that rustfmt prefers to collapse. Functional no-op. https://claude.ai/code/session_01Vp4cvRTaLYd3162EwwcCXg

mrjeeves merged commit 3846fb5 into main May 27, 2026
6 checks passed

mrjeeves deleted the claude/relax-announce-cadence branch May 27, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

signaling: relax announce cadence (storage + reflection do discovery now)#15

signaling: relax announce cadence (storage + reflection do discovery now)#15
mrjeeves merged 5 commits into
mainfrom
claude/relax-announce-cadence

mrjeeves commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrjeeves commented May 27, 2026

Summary

Why this is safe

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants