Skip to content

tests: (p2p) TestBroadcaster is flaky#2821

Draft
iurii-ssv wants to merge 18 commits into
stagefrom
tests-p2p-flaky-broadcaster-test
Draft

tests: (p2p) TestBroadcaster is flaky#2821
iurii-ssv wants to merge 18 commits into
stagefrom
tests-p2p-flaky-broadcaster-test

Conversation

@iurii-ssv
Copy link
Copy Markdown
Contributor

@iurii-ssv iurii-ssv commented Apr 20, 2026

Rebased onto #2819 (only the latest commit here is new).

Resolves flaky pipeline https://github.com/ssvlabs/ssv/actions/runs/24669605966/job/72136402325?pr=2819

--- FAIL: TestBroadcaster (0.11s)
    broadcaster_test.go:46: 
        	Error Trace:	/home/runner/work/ssv/ssv/exporter/api/broadcaster_test.go:46
        	Error:      	Not equal: 
        	            	expected: 0
        	            	actual  : 1
        	Test:       	TestBroadcaster

Related to https://github.com/ssvlabs/ssv-node-board/issues/941 (discovered during investigation).

Both TestRun_ActualExecution and TestRun_ActualExecutionFullMode
repeated the same 5-line listen-then-release-port dance. Factor it
out into a reserveFreePort helper matching the one added to
exporter/api/server_test.go in the prior commit, so the two packages
use the same pattern.

Also drop the now-stale cross-reference comment in the exporter
helper now that api/server has the same thing.
Callers may send on the feed immediately after Start returns; previously
the subscribe happened inside a goroutine, so early sends could be lost.
Split FromFeed into a synchronous Subscribe + an internal pump goroutine
so the subscription is live before Start returns.

Also tidies wsServer.Start: scope httpServer locally, wrap listen error
with fmt.Errorf, and use errors.Is for the ErrServerClosed check.
…sageValidation

The count assertion is incompatible with libp2p gossipsub's peer-score filtering
(misbehaving peers get filtered at the pubsub layer before reaching the
validator callback once their score crosses the threshold), and the strict
ordering assertion flaked because peers with identical rejected-rates (nodes 0
and 1) differ only within scoring noise.

Replace both with a rejected-rate invariant: peers with a strictly lower
rejected-rate must not be ranked below peers with a higher rate. Ties within
the same rate group are allowed.

Also drop dead TotalAccepted/TotalIgnored/TotalRejected fields and the
roleBroadcasts tracker that fed the removed count assertion.
@iurii-ssv iurii-ssv requested review from a team as code owners April 20, 2026 16:11
@iurii-ssv iurii-ssv changed the title Tests p2p flaky broadcaster test tests: (p2p) TestBroadcaster is flaky Apr 20, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 56.73077% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.6%. Comparing base (d18ff74) to head (fd4e932).
⚠️ Report is 1 commits behind head on stage.

Files with missing lines Patch % Lines
network/p2p/p2p_setup.go 20.0% 19 Missing and 1 partial ⚠️
exporter/api/server.go 68.1% 4 Missing and 3 partials ⚠️
api/server/server.go 76.9% 3 Missing and 3 partials ⚠️
network/discovery/local_service.go 0.0% 4 Missing ⚠️
operator/node.go 0.0% 3 Missing ⚠️
cli/operator/node.go 0.0% 2 Missing ⚠️
network/p2p/testutils.go 75.0% 1 Missing and 1 partial ⚠️
network/discovery/testutils.go 0.0% 1 Missing ⚠️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@iurii-ssv iurii-ssv marked this pull request as draft April 20, 2026 16:13
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR fixes the flaky TestBroadcaster by introducing require.Eventually to synchronize on msg1 delivery before deregistering bm2, eliminating the race where the broadcaster's connection snapshot for msg2 could still include bm2. Alongside that core fix, it refactors exporter/api and api/server servers to return the bound address from Start() (enabling :0 port in tests), adds a per-test randomized mDNS service tag to prevent cross-discovery between concurrent test processes, and tightens the peer-score ordering invariant in TestP2pNetwork_MessageValidation to avoid false failures on score-tied peers.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor style suggestions with no correctness impact.

The core flakiness fix is logically sound: waiting for both subscribers to confirm receipt of msg1 guarantees the broadcast goroutine has returned before bm2 is deregistered, making the ordering deterministic. The mDNS tag isolation and peer-score invariant simplification are straightforward improvements. The only open item is a stray fmt.Println debug statement in the test mock, which is noisy but harmless.

exporter/api/broadcaster_test.go — remove leftover fmt.Println("sent") from broadcastedMock.Send

Important Files Changed

Filename Overview
exporter/api/broadcaster_test.go Fixed flaky TestBroadcaster by using require.Eventually to synchronize on msg1 delivery before deregistering bm2, eliminating the race; leftover fmt.Println("sent") in broadcastedMock.Send produces noisy test output.
exporter/api/broadcaster.go Added clear doc comment to FromFeed explaining that Subscribe is called synchronously before the goroutine is spawned, guaranteeing delivery ordering; no logic changes.
exporter/api/server.go Start now binds a net.Listener before spawning the serve goroutine and returns the bound address, enabling port-0 usage in tests; broadcaster.FromFeed moved into Start to guarantee subscription happens before serving.
exporter/api/server_test.go Tests updated to use the new Start (string, error) API with :0 port; new waitForCondition helper replaces raw time.Sleep polling; TestHandleStream now properly synchronizes on connection registration.
api/server/server.go Refactored Start to return (string, error) with the bound address, consistent with the exporter/api/server pattern; no behavioral change in production.
api/server/server_test.go New comprehensive test suite covering New constructor, Start lifecycle, middleware behavior, route registration, and graceful shutdown.
network/p2p/testutils.go Added randomMdnsTag() and mdnsTag field to LocalNet; each test network now uses a unique mDNS service tag, preventing cross-discovery between concurrently-running test processes.
network/p2p/config.go Added MdnsDiscoveryTag field to Config, wired through from testutils to NewLocalDiscovery; production path unchanged (empty tag falls back to LocalDiscoveryServiceTag).
network/p2p/p2p_validation_test.go Replaced the strict accepted>ignored>rejected score ordering assertion with a more robust invariant: any peer with a strictly higher rejected-rate must not rank above one with a lower rate; ties are allowed in any order.
network/discovery/local_service.go NewLocalDiscovery now accepts a serviceTag parameter; empty tag falls back to LocalDiscoveryServiceTag, enabling tests to isolate their mDNS groups.

Sequence Diagram

sequenceDiagram
    participant Test
    participant Feed
    participant BroadcastGoroutine
    participant bm1
    participant bm2

    Test->>Feed: Subscribe(buffer) [synchronous]
    Test->>BroadcastGoroutine: spawn goroutine
    Test->>Feed: Send(msg1)
    Feed-->>BroadcastGoroutine: buffer <- msg1
    BroadcastGoroutine->>bm1: Send(msg1)
    BroadcastGoroutine->>bm2: Send(msg1)
    Note over Test: require.Eventually: bm1.Size()==1 && bm2.Size()==1
    Test->>Test: Deregister(bm2) [safe: goroutine idle in select]
    Test->>Feed: Send(msg2)
    Feed-->>BroadcastGoroutine: buffer <- msg2
    BroadcastGoroutine->>bm1: Send(msg2)
    Note over BroadcastGoroutine: bm2 not in snapshot — skipped
    Note over Test: require.Eventually: bm1.Size()==2
    Test->>Test: assert bm2.Size()==1 ✓
Loading

Comments Outside Diff (1)

  1. exporter/api/broadcaster_test.go, line 76 (link)

    P2 Debug fmt.Println left in test mock

    fmt.Println("sent") is a leftover debugging statement that will emit a line to stdout on every message broadcast in this test. It adds noise to CI output without adding any test value.

Reviews (1): Last reviewed commit: "tests: (p2p) TestBroadcaster is flaky" | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant