Skip to content

Conversation

@0pcom
Copy link
Collaborator

@0pcom 0pcom commented Nov 29, 2025

go get -u ./...
go mod tidy
go mod vendor
git add go.mod go.sum vendor

Moses Narrow added 16 commits August 23, 2025 18:05
Minor formatting cleanup to trigger CI re-run.
Root Cause:
-----------
The smux library was upgraded to v1.5.44 (likely via `go get -u ./...`),
which introduced breaking changes in stream multiplexing behavior. In
commit 3aef70af, smux changed the shaper channel from unbuffered to
buffered, altering synchronization/timing in the multiplexing layer.

Impact:
-------
TestHTTPTransport_RoundTrip was failing on Darwin CI with errors like:
- "stream closed" on concurrent /hash endpoint requests
- "session shutdown" after first failure
- "connection reset by peer" in DMSG stream layer

The test spawns 3 HTTP clients making 10 concurrent requests each (30
total). Client3's /hash requests consistently failed while /index.html
and /echo requests from other clients succeeded. This pattern indicated
a concurrency issue in the underlying stream multiplexing.

Solution:
---------
Downgrade github.com/xtaci/smux from v1.5.44 to v1.5.40, restoring the
previous unbuffered channel behavior that DMSG was designed for. The
same fix was applied to skywire repository in commit a52c15b00.

Testing:
--------
After downgrade, TestHTTPTransport_RoundTrip passes reliably with all
30 concurrent requests succeeding (verified with race detector enabled).
Improvements to handle concurrent requests more reliably:

1. Added defer-based stream cleanup in RoundTrip
   - Ensures DMSG stream is closed if errors occur after DialStream
   - Prevents stream leaks when req.Write() or ReadResponse() fail

2. Added 10-second timeout to http.Client in tests
   - Prevents test from hanging if streams fail
   - Provides clearer failure mode for debugging

These changes should help with the Darwin CI test failures where
concurrent HTTP requests were experiencing 'stream closed' and 'session
shutdown' errors under the new smux v1.5.44 buffered channel behavior.
Added nolint:errcheck directive with comment explaining this is
best-effort cleanup on an error path where we're already returning
an error to the caller.
@0pcom 0pcom merged commit 3f98a57 into skycoin:develop Nov 29, 2025
3 checks passed
@0pcom 0pcom deleted the update-deps branch November 29, 2025 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant