Environment
- str0m version: latest (both sides)
- Setup: str0m-based SFU server + str0m-based load test client (str0m-stress), both using Rtc::builder().build()
- Network: client runs on a remote machine, SFU in Docker on a VPS (2 cores), UDP ports 16384-16385 mapped through Docker
- Scale: 2-30+ concurrent clients connecting to the same SFU
Symptom
ICE completes successfully for all clients (Checking → Connected → Completed), but DTLS only completes for ~20-30% of connections. The remaining clients never get past the DTLS handshake, no data channel opens, no media flows, and the connection eventually times out.
Observed behavior
Working client (minority):
ICE state -> Connected
ICE state -> Completed
DTLS setup is: true
ClientHello: DTLS version=DTLS1_2, cookie_len=0, offering 3 cipher suites
[ServerHello received, handshake completes]
ChannelOpen("data")
MediaData received ✓
Failing client (majority):
ICE state -> Connected
ICE state -> Completed
DTLS setup is: true
ClientHello: DTLS version=DTLS1_2, cookie_len=0, offering 3 cipher suites
[No ServerHello — nothing comes back]
Flight timeout in: 1.189s
[Retransmit ClientHello]
Flight timeout in: 0.802s
[Retransmit again, eventually gives up after connect timeout (40s)]
Key data points
- ICE works fine for all clients — STUN binding succeeds, candidate pairs are validated, consent checks flow at ~1 pps. UDP connectivity is not the issue.
- The server receives the DTLS ClientHello — our packet dispatch logs confirm the UDP packet is received on the correct socket, dispatched to the correct Rtc instance via handle_input(Input::Receive(...)). No packets are dropped.
- The server does not produce a DTLS response — after handle_input processes the ClientHello, poll_output() does not return a Transmit containing a ServerHello/HelloVerifyRequest. The
SFU's event loop calls poll_output in a tight loop until Timeout, so it's not a missed-poll issue.
- The SFU runs 2 threads with one UdpSocket per thread. Each client is assigned to a specific thread/socket via hash-based sharding. The working and failing clients are distributed across
both threads — it's not a per-thread issue.
- All clients share the same pre-generated DtlsCert on the server side (via Arc). The client side generates a fresh cert per Rtc::builder().build().
- Timing pattern: the first 1-2 clients almost always succeed. As more clients connect concurrently (even just 4-8 total), the DTLS success rate drops dramatically. With 8 concurrent
clients, typically only 1-2 complete DTLS.
- When a client is torn down and reconnected (new Rtc, new WHIP POST, new UDP socket), it sometimes succeeds on the retry — suggesting the issue is timing/state-dependent, not a permanent
crypto mismatch.
SDP setup
- Client (offerer): creates offer with a=setup:actpass
- Server (answerer): accept_offer() generates answer — presumably with a=setup:active (server initiates DTLS)
The client-side log shows DTLS setup is: true followed by ClientHello, which suggests the client thinks it's the DTLS initiator (active role). If the server also considers itself active (from the SDP answer), both sides would send ClientHello to each other, and neither would respond with ServerHello — a role conflict deadlock.
However, we haven't confirmed this theory because we can't easily inspect the SDP a=setup attribute at runtime. If both sides end up as DTLS active, the handshake would deadlock exactly as observed.
Reproduction
// Server side (SFU)
let mut rtc = Rtc::builder()
.set_dtls_cert(shared_cert.clone())
.set_fingerprint_verification(true)
.enable_bwe(Some(Bitrate::kbps(1000)))
.build(Instant::now());
let candidate = Candidate::host(local_addr, "udp")?;
rtc.add_local_candidate(candidate);
let answer = rtc.sdp_api().accept_offer(client_offer)?;
// → send answer back to client via HTTP
// → send Rtc to event loop thread
// Client side (bench tool)
let mut rtc = Rtc::builder()
.set_stats_interval(Some(Duration::from_secs(2)))
.build(Instant::now());
let candidate = Candidate::host(local_addr, "udp")?;
rtc.add_local_candidate(candidate);
let mut change = rtc.sdp_api();
change.add_media(MediaKind::Audio, Direction::SendOnly, None, None, None);
change.add_channel("data".to_string());
let (offer, pending) = change.apply()?;
let answer = SdpAnswer::from_sdp_string(&server_answer)?;
rtc.sdp_api().accept_answer(pending, answer)?;
// Event loop: poll_output → Transmit (send), recv_from → Input::Receive
Connect 8+ clients simultaneously to reproduce. First 1-2 usually succeed, rest hang at DTLS.
Questions
- Is the DTLS role (active/passive) correctly derived from the SDP a=setup attribute in accept_offer / accept_answer? Could both sides end up as active?
- Is there a known concurrency issue when multiple Rtc instances on the same thread/process perform DTLS handshakes simultaneously?
- Could the shared DtlsCert cause issues when multiple handshakes use the same certificate concurrently?
Note : I used AI to wrote this issue because I'm French and I want to be clear
Environment
Symptom
ICE completes successfully for all clients (Checking → Connected → Completed), but DTLS only completes for ~20-30% of connections. The remaining clients never get past the DTLS handshake, no data channel opens, no media flows, and the connection eventually times out.
Observed behavior
Working client (minority):
ICE state -> Connected
ICE state -> Completed
DTLS setup is: true
ClientHello: DTLS version=DTLS1_2, cookie_len=0, offering 3 cipher suites
[ServerHello received, handshake completes]
ChannelOpen("data")
MediaData received ✓
Failing client (majority):
ICE state -> Connected
ICE state -> Completed
DTLS setup is: true
ClientHello: DTLS version=DTLS1_2, cookie_len=0, offering 3 cipher suites
[No ServerHello — nothing comes back]
Flight timeout in: 1.189s
[Retransmit ClientHello]
Flight timeout in: 0.802s
[Retransmit again, eventually gives up after connect timeout (40s)]
Key data points
SFU's event loop calls poll_output in a tight loop until Timeout, so it's not a missed-poll issue.
both threads — it's not a per-thread issue.
clients, typically only 1-2 complete DTLS.
crypto mismatch.
SDP setup
The client-side log shows DTLS setup is: true followed by ClientHello, which suggests the client thinks it's the DTLS initiator (active role). If the server also considers itself active (from the SDP answer), both sides would send ClientHello to each other, and neither would respond with ServerHello — a role conflict deadlock.
However, we haven't confirmed this theory because we can't easily inspect the SDP a=setup attribute at runtime. If both sides end up as DTLS active, the handshake would deadlock exactly as observed.
Reproduction
Connect 8+ clients simultaneously to reproduce. First 1-2 usually succeed, rest hang at DTLS.
Questions
Note : I used AI to wrote this issue because I'm French and I want to be clear