DTLS handshake fails for most clients in multi-client str0m-to-str0m setup

## Environment

  - str0m version: latest (both sides)                                                                                                                                                        
  - Setup: str0m-based SFU server + str0m-based load test client (str0m-stress), both using Rtc::builder().build()                                                                            
  - Network: client runs on a remote machine, SFU in Docker on a VPS (2 cores), UDP ports 16384-16385 mapped through Docker                                                                   
  - Scale: 2-30+ concurrent clients connecting to the same SFU   

## Symptom

ICE completes successfully for all clients (Checking → Connected → Completed), but DTLS only completes for ~20-30% of connections. The remaining clients never get past the DTLS handshake, no data channel opens, no media flows, and the connection eventually times out.

 ## Observed behavior

### Working client (minority):
ICE state -> Connected                                                                                                                                                                      
ICE state -> Completed                                                                                                                                                                      
DTLS setup is: true                                                                                                                                                                         
ClientHello: DTLS version=DTLS1_2, cookie_len=0, offering 3 cipher suites                                                                                                                   
[ServerHello received, handshake completes]                                                                                                                                                 
ChannelOpen("data")                                                                                                                                                                         
MediaData received ✓ 

### Failing client (majority): 
ICE state -> Connected                                                                                                                                                                      
ICE state -> Completed                                                                                                                                                                      
DTLS setup is: true                                                                                                                                                                         
ClientHello: DTLS version=DTLS1_2, cookie_len=0, offering 3 cipher suites                                                                                                                
[No ServerHello — nothing comes back]                                                                                                                                                       
Flight timeout in: 1.189s                                                                                                                                                                   
[Retransmit ClientHello]                                                                                                                                                                    
Flight timeout in: 0.802s                                                                                                                                                                   
[Retransmit again, eventually gives up after connect timeout (40s)] 

## Key data points

1. ICE works fine for all clients — STUN binding succeeds, candidate pairs are validated, consent checks flow at ~1 pps. UDP connectivity is not the issue.                                 
2. The server receives the DTLS ClientHello — our packet dispatch logs confirm the UDP packet is received on the correct socket, dispatched to the correct Rtc instance via                         handle_input(Input::Receive(...)). No packets are dropped.                                                                                                                                  
3. The server does not produce a DTLS response — after handle_input processes the ClientHello, poll_output() does not return a Transmit containing a ServerHello/HelloVerifyRequest. The    
SFU's event loop calls poll_output in a tight loop until Timeout, so it's not a missed-poll issue.                                                                                          
4. The SFU runs 2 threads with one UdpSocket per thread. Each client is assigned to a specific thread/socket via hash-based sharding. The working and failing clients are distributed across
both threads — it's not a per-thread issue.                                                                                                                                                
5. All clients share the same pre-generated DtlsCert on the server side (via Arc<DtlsCert>). The client side generates a fresh cert per Rtc::builder().build().                             
6. Timing pattern: the first 1-2 clients almost always succeed. As more clients connect concurrently (even just 4-8 total), the DTLS success rate drops dramatically. With 8 concurrent     
clients, typically only 1-2 complete DTLS.                                                                                                                                                  
7. When a client is torn down and reconnected (new Rtc, new WHIP POST, new UDP socket), it sometimes succeeds on the retry — suggesting the issue is timing/state-dependent, not a permanent
crypto mismatch.

### SDP setup

- Client (offerer): creates offer with a=setup:actpass
- Server (answerer): accept_offer() generates answer — presumably with a=setup:active (server initiates DTLS) 

The client-side log shows DTLS setup is: true followed by ClientHello, which suggests the client thinks it's the DTLS initiator (active role). If the server also considers itself active  (from the SDP answer), both sides would send ClientHello to each other, and neither would respond with ServerHello — a role conflict deadlock.

However, we haven't confirmed this theory because we can't easily inspect the SDP a=setup attribute at runtime. If both sides end up as DTLS active, the handshake would deadlock exactly as observed.

### Reproduction 

```RUST
// Server side (SFU)
  let mut rtc = Rtc::builder()                                                                                                                                                                
      .set_dtls_cert(shared_cert.clone())                                                                                                                                                     
      .set_fingerprint_verification(true)                                                                                                                                                     
      .enable_bwe(Some(Bitrate::kbps(1000)))                                                                                                                                                  
      .build(Instant::now());                                                                                                                                                                 
                                                                                                                                                                                              
  let candidate = Candidate::host(local_addr, "udp")?;                                                                                                                                        
  rtc.add_local_candidate(candidate);                                                                                                                                                         
                                                                                                                                                                                              
  let answer = rtc.sdp_api().accept_offer(client_offer)?;                                                                                                                                     
  // → send answer back to client via HTTP                                                                                                                                                    
  // → send Rtc to event loop thread                                                                                                                                                          
                                                                                                                                                                                              
  // Client side (bench tool)                                                                                                                                                                 
  let mut rtc = Rtc::builder()                                                                                                                                                                
      .set_stats_interval(Some(Duration::from_secs(2)))                                                                                                                                       
      .build(Instant::now());                                                                                                                                                                 
                                                                                                                                                                                              
  let candidate = Candidate::host(local_addr, "udp")?;                                                                                                                                        
  rtc.add_local_candidate(candidate);                                                                                                                                                         
                                                                                                                                                                                              
  let mut change = rtc.sdp_api();                                                                                                                                                             
  change.add_media(MediaKind::Audio, Direction::SendOnly, None, None, None);                                                                                                                  
  change.add_channel("data".to_string());                                                                                                                                                     
  let (offer, pending) = change.apply()?;                                                                                                                                                     
                                                                                                                                                                                              
  let answer = SdpAnswer::from_sdp_string(&server_answer)?;                                                                                                                                   
  rtc.sdp_api().accept_answer(pending, answer)?;                                                                                                                                              
                                                                                                                                                                                              
  // Event loop: poll_output → Transmit (send), recv_from → Input::Receive
```

Connect 8+ clients simultaneously to reproduce. First 1-2 usually succeed, rest hang at DTLS.

### Questions

1. Is the DTLS role (active/passive) correctly derived from the SDP a=setup attribute in accept_offer / accept_answer? Could both sides end up as active?
2. Is there a known concurrency issue when multiple Rtc instances on the same thread/process perform DTLS handshakes simultaneously?
3. Could the shared DtlsCert cause issues when multiple handshakes use the same certificate concurrently?

**Note : I used AI to wrote this issue because I'm French and I want to be clear**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DTLS handshake fails for most clients in multi-client str0m-to-str0m setup #932

Environment

Symptom

Observed behavior

Working client (minority):

Failing client (majority):

Key data points

SDP setup

Reproduction

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DTLS handshake fails for most clients in multi-client str0m-to-str0m setup #932

Description

Environment

Symptom

Observed behavior

Working client (minority):

Failing client (majority):

Key data points

SDP setup

Reproduction

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions