Skip to content

fix: handle outbound re-INVITE routing, SDP echo, and direction negotiation#603

Closed
rtnpro wants to merge 18 commits intolivekit:mainfrom
Bigly-Sales:claude/epic-neumann
Closed

fix: handle outbound re-INVITE routing, SDP echo, and direction negotiation#603
rtnpro wants to merge 18 commits intolivekit:mainfrom
Bigly-Sales:claude/epic-neumann

Conversation

@rtnpro
Copy link

@rtnpro rtnpro commented Mar 2, 2026

Summary

  • Re-INVITE routing: Route mid-dialog re-INVITEs for outbound calls via Client.onInvite with byCallID lookup, preventing them from being misidentified as new inbound calls (which caused 486 rejection and call teardown)
  • SDP echo fix: Store outbound SDP as an independent copy (ownSDP) so re-INVITE responses contain LiveKit's SDP, not the carrier's SDP echoed back (which caused the carrier to loop audio to itself)
  • SDP direction negotiation: Parse incoming re-INVITE direction attributes and respond with the RFC 3264 complement (sendonly → recvonly, etc.), supporting hold/resume/transfer scenarios in call center environments
  • Media destination update: Update RTP destination when the carrier's media address changes after a call transfer, skipping 0.0.0.0 hold addresses

Test plan

  • Integration test TestSIPOutboundReInvite — verifies re-INVITE gets 200 OK with correct SDP and direction
  • Integration test TestSIPOutboundReInviteCallDrop — verifies re-INVITE is not rejected as new inbound call when matching inbound trunk exists
  • Unit tests for SDP direction utilities (TestGetMediaDirection, TestComplementDirection, TestSetMediaDirection, TestGetRemoteMediaAddr)
  • Unit tests for hold/resume/inactive/no-SDP re-INVITE scenarios (TestHandleReInviteHold, TestHandleReInviteResume, TestHandleReInviteInactive, TestHandleReInviteNoSDP)
  • Unit tests for SDP echo prevention (TestOutboundReInviteSDPNotEchoed)
  • Manual test with SIP trunk carrier for hold/transfer flows

briankwest and others added 18 commits February 6, 2026 16:31
Fixes a critical bug where mid-dialog re-INVITEs from the remote party
were incorrectly processed as new inbound calls instead of being
recognized as part of an existing dialog.

## Problem
When a SIP provider (e.g., SignalWire) sends a re-INVITE during an active
call, LiveKit was treating it as a new call, causing:
- Authentication challenges (401/407)
- "No trunk found" errors (404)
- Call failures and disconnections

This primarily affected outbound calls where From/To headers are "swapped"
from LiveKit's perspective (remote is UAC for the re-INVITE).

## Root Cause
The existing dialog matching happened too late (after processInvite) and
only checked byCallID, which is unreliable. The code missed the definitive
proof: the To tag in the request matches our local tag.

Per RFC 3261 Section 12.2, a dialog is identified by:
- Call-ID
- Local tag (in To header for UAS)
- Remote tag (in From header for UAS)

## Solution
Add early dialog matching in onInvite() before processing as new call:
1. Extract To tag from incoming INVITE
2. Check if tag exists in byLocalTag map
3. If match found, delegate to existing call's handleReinvite()
4. Otherwise, proceed with normal new call processing

New handleReinvite() method on inboundCall:
- Validates CSeq (detect retransmissions vs new re-INVITEs)
- Responds with current SDP for session refresh
- Handles edge cases (old CSeq, missing headers)

## Impact
✅ Fixes outbound call re-INVITEs (broken → working)
✅ Improves inbound call re-INVITEs (performance + reliability)
✅ RFC 3261 compliant dialog identification
✅ Reduced CPU usage (early exit avoids processInvite)
✅ Better logging for debugging

## Testing
Tested with SignalWire re-INVITEs on both inbound and outbound calls.
Verified proper 200 OK responses and continued call operation.
The nonce was using only time.Now().UnixMicro() which could produce
identical values for calls arriving in the same microsecond, causing
the TestDigestAuthSimultaneousCalls test to fail.

Now includes the Call-ID in the nonce to ensure uniqueness even when
multiple authentication challenges are generated simultaneously.

This is a defensive fix to prevent race conditions in digest auth.
Adds comprehensive test coverage for the new handleReinvite() method:
- Missing CSeq header (400 Bad Request)
- Retransmission detection (same CSeq → resend 200 OK)
- Out-of-order INVITE (lower CSeq → ignore)
- New re-INVITE (higher CSeq → accept with current SDP)
- No SDP available error handling (500 Internal Server Error)
- Nonce uniqueness verification

These tests cover the main code paths added by the re-INVITE fix,
improving code coverage for the new functionality.
- Extract nonce generation into generateNonce() helper function
- Add nil check and logging when SDP unavailable during retransmission
- Use AcceptAsKeepAlive() helper to ensure proper SIP headers
- Remove redundant Call-ID based re-INVITE fallback in processInvite()
- Improve TestNonceUniqueness to test actual helper function

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Remove unused tr variable
- Extract To tag from re-INVITE request as local tag for newInbound
- Fixes type mismatch (RemoteTag vs LocalTag)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Instead of creating a temporary sipInbound (which requires server
infrastructure and fails in unit tests), directly create the response
following the same pattern as AcceptAsKeepAlive/respondWithData:
- Add Content-Type header
- Add Allow header with supported methods
- Add Contact header
- Add extra headers via addExtraHeaders

This satisfies the intent of using proper SIP headers while working
in both production and test scenarios.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add defensive nil checks before accessing contact and calling
addExtraHeaders to prevent panics in test scenarios where the
full server infrastructure isn't initialized.

- Check c.cc.contact != nil before appending Contact header
- Check c.s != nil && c.s.conf != nil before calling addExtraHeaders

This allows unit tests to work with minimal inboundCall setup while
maintaining full functionality in production.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Introduce a graceful shutdown mechanism for the SIP service.
- Add `StartDrain` methods to `sip.Client`, `sip.Server`, and `sip.Service`. These methods stop accepting new inbound SIP INVITEs and new outbound `CreateSIPParticipant` requests, respectively, while allowing existing calls to complete.
- Integrate the `StartDrain` call into the service shutdown sequence in `service.Run()` to immediately prevent new call establishments.
- Add a `DisableOutboundCalls` configuration option to prevent new outbound SIP calls. When enabled, `CreateSIPParticipant` requests will be rejected, and affinity will be zeroed out.
- Enhance the health check endpoint:
- Expose `/healthz` in addition to `/`.
- Implement a 30-second grace period after service stop before exiting, allowing load balancers to detect unhealthy status.
- Update `.gitignore` to include `.claude`.

Changes:
- .gitignore (2 lines, 7 characters)
- cmd/livekit-sip/main.go (2 lines, 219 characters)
- pkg/config/config.go (4 lines, 274 characters)
- pkg/service/service.go (26 lines, 1142 characters)
- pkg/sip/client.go (6 lines, 213 characters)
- pkg/sip/inbound.go (7 lines, 429 characters)
- pkg/sip/server.go (6 lines, 220 characters)
- pkg/sip/service.go (14 lines, 522 characters)
- test/cloud/service.go (2 lines, 263 characters)
…matching

Fix re-INVITE dialog matching using RFC 3261 compliant To tag lookup
Add TestSIPOutboundReInvite that proves mid-dialog re-INVITEs on
outbound calls are misidentified as new inbound calls and rejected
as "flood" (status 486), killing the active call.

Also fix pre-existing compilation error in sip_test.go where
service.NewService was missing the StartDrain parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…NVITE

Add TestSIPOutboundReInviteCallDrop that reproduces the production
scenario where a mid-dialog re-INVITE on an outbound call gets 486
Rejected and the call is torn down with BYE. This happens when an
inbound trunk matches the reversed call direction (common with SBC
session refresh).

Also refactor shared UAS setup and re-INVITE helpers into reusable
functions (setupUAS, sendReInvite, waitForFinalResponse).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Re-INVITEs for outbound calls were misidentified as new inbound calls
because Server.onInvite() only checked its own byLocalTag map (inbound
calls) and never delegated to the Client. This adds the missing link:
Server delegates unmatched To-tag INVITEs to sipUnhandled (Client),
Client looks up outbound calls by SIP Call-ID, and responds with 200 OK.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add SendReInvite() to siptest.Client for sending mid-dialog re-INVITEs
within an established call, and TestSIPReInvite integration test that
verifies re-INVITEs are accepted with 200 OK and the call stays active.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…E check

The second ExpectRoomWithParticipants call after re-INVITE was missing
the SIP participant's Metadata, Name, and full Attributes, causing the
strict equality check in compareParticipants to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…P in re-INVITE response

The AcceptReInvite handler used c.invite.Body() which returns a reference
to the SIP request's internal buffer. This buffer can become stale after
the transaction terminates, causing the 200 OK to echo the carrier's SDP
back instead of our own — telling the carrier to loop audio to itself.

Store our SDP offer as an independent byte copy (ownSDP) on the
sipOutbound struct, mirroring the pattern used by sipInbound.lastSDP.
Add debug logging of both request and response SDP for production
troubleshooting. Add unit and integration tests verifying the response
SDP contains LiveKit's session (s=LiveKit), not the carrier's.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Handle RFC 3264 direction attributes in re-INVITE responses to support
call hold/resume/transfer scenarios. When a carrier sends a=sendonly
(hold), we now respond with a=recvonly instead of always using
a=sendrecv. Also updates the RTP destination when the carrier's media
address changes after a call transfer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rtnpro rtnpro requested a review from a team as a code owner March 2, 2026 20:07
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 4 committers have signed the CLA.

✅ dennwc
✅ briankwest
❌ Biswajit Pain
❌ rtnpro


Biswajit Pain seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@rtnpro rtnpro closed this Mar 2, 2026
@rtnpro rtnpro deleted the claude/epic-neumann branch March 2, 2026 20:08
@rtnpro rtnpro restored the claude/epic-neumann branch March 2, 2026 20:09
@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

❌ Patch coverage is 79.92278% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.63%. Comparing base (0460b40) to head (7fb8e11).
⚠️ Report is 229 commits behind head on main.

Files with missing lines Patch % Lines
pkg/sip/outbound.go 74.24% 10 Missing and 7 partials ⚠️
pkg/siptest/client.go 64.70% 6 Missing and 6 partials ⚠️
pkg/sip/inbound.go 87.14% 7 Missing and 2 partials ⚠️
pkg/service/service.go 40.00% 5 Missing and 1 partial ⚠️
pkg/sip/sdp_direction.go 91.48% 2 Missing and 2 partials ⚠️
pkg/sip/service.go 42.85% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #603      +/-   ##
==========================================
- Coverage   65.25%   63.63%   -1.62%     
==========================================
  Files          51       36      -15     
  Lines        6588     7156     +568     
==========================================
+ Hits         4299     4554     +255     
- Misses       1915     2162     +247     
- Partials      374      440      +66     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants