Add StartContext to ICE and DTLS Transport#3371
Conversation
|
Seems useful @HashimTheArab! I’m curious what’s the bug/use case your solving for? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3371 +/- ##
==========================================
- Coverage 85.55% 85.34% -0.21%
==========================================
Files 81 81
Lines 9856 9865 +9
==========================================
- Hits 8432 8419 -13
- Misses 1006 1022 +16
- Partials 418 424 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
We're using this for Minecraft Nethernet connections (https://github.com/lactyy/go-nethernet) and the transport methods can hang indefinitely if the remote peer never responds. Without this option we have to use this fragile hack: func withContextCancel(ctx context.Context, f func() error, cancel func()) error {
errCh := make(chan error, 1)
go func() {
errCh <- f()
}()
select {
case <-ctx.Done():
if cancel != nil {
cancel()
}
go func() { <-errCh }()
return ctx.Err()
case err := <-errCh:
return err
}
}Passing a context will allow us to simplify and improve all of the call sites: // Before
if err := withContextCancel(ctx, func() error {
return conn.ice.Start(nil, d.ice, &iceRole)
}, func() {
_ = conn.ice.Stop()
}); err != nil {
return fmt.Errorf("start ICE: %w", err)
}
// After
if err := conn.ice.StartContext(ctx, nil, d.ice, &iceRole); err != nil {
return fmt.Errorf("start ICE: %w", err)
}(I am not the author of the PR) |
|
@RealSexMC this is interesting I'm curious why you don't use pion/ice directly? seems more fit for your use, but I might be missing something. We're currently working on improving pion/ice and making it more suitable for non-webrtc use. |
|
@RealSexMC that is very interesting, couldn't you |
|
NetherNet is a full WebRTC stack, it uses ICE + DTLS + SCTP with data channels (ReliableDataChannel and UnreliableDataChannel) negotiated via SDP. We use pion's ORTC API to set up each transport individually since the signaling is custom (not standard WebSocket/HTTP), but the transports themselves are standard WebRTC. |
|
Anything else required to get this merged? |
|
Can you please fix the lint? |
|
Should be good now |
|
I've updated this to also add a StartContext method to DTLS transport |
|
Hey is anything else needed to merge this? |
|
@HashimTheArab Can you please update this branch and fix the tests before we can merge this. thank you. |
edbbc65 to
e615471
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7cac620 to
c9d2f4c
Compare
|
Fixed the issues. The test-i386 failure is from the master branch. Do I need to add additional testing for that codecov/patch? |
|
Hey @JoTurk please let me know what's needed to get this merged, the test coverage is failing due to 66% but I don't see any tests to meaningfully add? |
JoTurk
left a comment
There was a problem hiding this comment.
Sorry for ghosting this PR. I lost track.
| @@ -134,7 +147,7 @@ func (t *ICETransport) Start(gatherer *ICEGatherer, params ICEParameters, role * | |||
| } | |||
| t.role = *role | |||
|
|
|||
| ctx, ctxCancel := context.WithCancel(context.Background()) | |||
| ctx, ctxCancel := context.WithCancel(ctx) | |||
| t.ctxCancel = ctxCancel | |||
|
|
|||
There was a problem hiding this comment.
This is a problem in the current ICE design, but it affects this PR, currently https://github.com/pion/ice/blob/main/transport.go ICE just returns an error if the context is cancelled after the connection checks started, and it doesn't call agent's close by itself (we should change in the next major version). But for this PR we should also call close when the context get cancelled.
The review thread identified that pion/ice starts connectivity checks before context cancellation returns, leaving the agent running unless WebRTC closes it explicitly. This keeps the existing StartContext API shape and routes canceled starts through ICETransport Stop so the gatherer/agent are released. Constraint: PR review requested closing the ICE agent when the StartContext context is canceled after checks begin. Rejected: Changing pion/ice behavior in this PR | reviewer noted that belongs in the next major pion/ice version. Confidence: high Scope-risk: narrow Directive: Keep StartContext cancellation tied to ICETransport shutdown until pion/ice owns cancellation cleanup itself. Tested: go test ./ -run 'TestICETransport_StartContextClosesOnCancel|TestPeerConnection_Close$' -count=1; golangci-lint run --new-from-rev=HEAD; go test ./...
Callers that cancel StartContext should receive the context cancellation error they can inspect, while WebRTC still closes the ICE transport to release the agent. Non-context start failures now clear the temporary internal cancel function instead of leaving it attached to a start that failed. Constraint: PR review requested context cancellation semantics before merge. Rejected: Returning the pion/ice cancellation error on caller cancellation | it hides the caller's context.Canceled or DeadlineExceeded signal. Confidence: high Scope-risk: narrow Directive: Keep cancellation-return semantics aligned with context-aware Go APIs. Tested: go test ./ -run 'TestICETransport_StartContextClosesOnCancel|TestPeerConnection_Close$' -count=1; golangci-lint run --new-from-rev=HEAD Not-tested: go test ./... did not complete cleanly locally due existing network-sensitive mDNS/IPv6 failures in TestPeerConnectionTrickle and Test_IPv6.
| if ctxErr := ctx.Err(); ctxErr != nil { | ||
| t.lock.Unlock() | ||
| _ = t.Stop() | ||
| t.lock.Lock() | ||
|
|
||
| return ctxErr | ||
| } |
There was a problem hiding this comment.
Use context.Cause(ctx) instead of ctx.Err()
There was a problem hiding this comment.
@sirzooro I don't think we would want a Pion api to return aribitary erros espically if this is used with third party created contexts, this will be kinda unpredicatable. and also sounds like a policy or API change across all of pion in a new major version and not just a single API and keep the rest.
There was a problem hiding this comment.
This would be useful when used with complex context trees. Cancellation cause set few levels up the tree is propagated down when context is cancelled. This allows to improve logs, it is immediately obvious that connection was terminated due to something what happened elsewhere. Generic error "context cancelled" does not carry this information.
There was a problem hiding this comment.
I see the point, but it's outside of the scope of this change, we shouldn't introduce a new pattern or error handling randomly in a single new API and keep the rest. Because having unpredictable APIs and maintaining different behaviors is way worse than losing some context, and If we would do it, we should do it across all of pion and we should use a linter to force it, and it should be in a new release.
There was a problem hiding this comment.
Thanks, I am ok with this.
|
|
||
| func TestDTLSTransport_Start_ErrICEConnectionNotStarted(t *testing.T) { | ||
| transport := &DTLSTransport{state: DTLSTransportStateNew} | ||
| transport := &DTLSTransport{api: NewAPI(), state: DTLSTransportStateNew} |
There was a problem hiding this comment.
I'm sorry can we revert this and check if t.api is null? calls with t.api null shouldn't panic.
Summary
StartContextmethod toICETransportthat accepts acontext.Contextparameter, allowing callers to control cancellation and timeouts for ICE connectivity checksStartto delegate toStartContextwithcontext.Background(), maintaining full backwards compatibility🤖 Generated with Claude Code