fix(bitswap): skip identify race when peer protocols are already known#1040
Open
Rinse12 wants to merge 1 commit into
Open
fix(bitswap): skip identify race when peer protocols are already known#1040Rinse12 wants to merge 1 commit into
Rinse12 wants to merge 1 commit into
Conversation
Network#connectTo awaits Promise.all([libp2p.dial, raceEvent('peer:identify',
filter for BITSWAP_120)]). peer:identify is single-shot per peer, so if the
peer was already identified before bitswap calls connectTo (common when the
host app dials peers via other subsystems first — pubsub, fetch, kad-dht,
manual dial), raceEvent waits for an event that will never re-fire and
connectTo hangs indefinitely. This in turn blocks findAndConnect's
find-providers fallback, so wantBlock times out instead of resolving from
already-connected bitswap-speaking peers.
Add a fast path: before the race, check libp2p.peerStore.get(peer). If the
peer is already known and advertises BITSWAP_120, return from a plain dial()
(cheap no-op when already connected). Otherwise fall through to the existing
identify-race code path unchanged.
Safety:
- connectTo callers (findAndConnect at network.ts:319 and 330) discard the
returned Connection and use connectTo only for its peer-warmup side effect.
- The actual bitswap stream is opened later in sendMessage via
libp2p.dialProtocol(peerId, BITSWAP_120), which performs its own protocol
negotiation. So a fast-path-returned connection whose bitswap stream is in
fact broken is detected and retried by sendMessage — no invariant is
violated.
Measured effect: in an IPNS-over-pubsub workload where pubsub-mesh warmup
dials the same peers that bitswap later wants providers from, median wantBlock
wall time drops ~17% (3823 -> 3192 ms over n=230 fetches).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Network#connectToawaitsPromise.all([libp2p.dial, raceEvent('peer:identify', filter for BITSWAP_120)]).peer:identifyis single-shot per peer per process. If the peer was already identified before bitswap callsconnectTo(common when the host app dials peers via other subsystems — pubsub, fetch, kad-dht, manualdial), theraceEventwaits for an event that will never re-fire andconnectTohangs indefinitely.This blocks
findAndConnect's find-providers fallback, sowantBlockis forced to time out instead of resolving from already-connected bitswap-speaking peers.Fix
Before the race, check
libp2p.peerStore.get(peer). If the peer is already known and advertisesBITSWAP_120, return from a plaindial()(cheap no-op when already connected). Otherwise fall through to the existing identify-race code path unchanged. Guarded withisPeerId(peer)so the existingPeerId | Multiaddr | Multiaddr[]signature still works.Safety
connectTocallers (findAndConnectat network.ts:319 and 330) discard the returnedConnectionand useconnectToonly for its peer-warmup side effect.sendMessagevialibp2p.dialProtocol(peerId, BITSWAP_120), which performs its own protocol negotiation. So a fast-path-returned connection whose bitswap stream is in fact broken is detected and retried bysendMessage— no invariant is violated.Measured effect
In an IPNS-over-pubsub workload where pubsub-mesh warmup dials the same peers that bitswap later wants providers from, median `wantBlock` wall time drops ~17% (3823 → 3192 ms over n=230 fetches). The improvement matches the theoretical save: the identify-race wait on already-identified peers is eliminated.
Tests
Added two tests in `packages/bitswap/test/network.spec.ts`:
Lint, dep-check, and the full bitswap node test suite (66 tests) all pass.