Skip to content

fix(bitswap): skip identify race when peer protocols are already known#1040

Open
Rinse12 wants to merge 1 commit into
ipfs:mainfrom
Rinse12:fix/bitswap-connect-to-identify-race
Open

fix(bitswap): skip identify race when peer protocols are already known#1040
Rinse12 wants to merge 1 commit into
ipfs:mainfrom
Rinse12:fix/bitswap-connect-to-identify-race

Conversation

@Rinse12
Copy link
Copy Markdown

@Rinse12 Rinse12 commented May 14, 2026

Problem

Network#connectTo awaits Promise.all([libp2p.dial, raceEvent('peer:identify', filter for BITSWAP_120)]). peer:identify is single-shot per peer per process. If the peer was already identified before bitswap calls connectTo (common when the host app dials peers via other subsystems — pubsub, fetch, kad-dht, manual dial), the raceEvent waits for an event that will never re-fire and connectTo hangs indefinitely.

This blocks findAndConnect's find-providers fallback, so wantBlock is forced to time out instead of resolving from already-connected bitswap-speaking peers.

Fix

Before the race, check libp2p.peerStore.get(peer). If the peer is already known and advertises BITSWAP_120, return from a plain dial() (cheap no-op when already connected). Otherwise fall through to the existing identify-race code path unchanged. Guarded with isPeerId(peer) so the existing PeerId | Multiaddr | Multiaddr[] signature still works.

Safety

  • connectTo callers (findAndConnect at network.ts:319 and 330) discard the returned Connection and use connectTo only for its peer-warmup side effect.
  • The actual bitswap stream is opened later in sendMessage via libp2p.dialProtocol(peerId, BITSWAP_120), which performs its own protocol negotiation. So a fast-path-returned connection whose bitswap stream is in fact broken is detected and retried by sendMessage — no invariant is violated.

Measured effect

In an IPNS-over-pubsub workload where pubsub-mesh warmup dials the same peers that bitswap later wants providers from, median `wantBlock` wall time drops ~17% (3823 → 3192 ms over n=230 fetches). The improvement matches the theoretical save: the identify-race wait on already-identified peers is eliminated.

Tests

Added two tests in `packages/bitswap/test/network.spec.ts`:

  • "should not wait for peer:identify when the peer is already known to speak bitswap" — stubs `peerStore.get` to return a peer whose `protocols` already contain `BITSWAP_120`, calls `connectTo`, and asserts the call resolves without subscribing to `peer:identify`. Pre-patch this test would hang because no `peer:identify` event is dispatched.
  • "should fall through to the identify race when the peer is not yet in peerStore" — stubs `peerStore.get` to reject, simulates `peer:identify` firing after `dial`, and asserts the slow-path subscription is still installed. Guards against the fast-path swallowing the unknown-peer case.

Lint, dep-check, and the full bitswap node test suite (66 tests) all pass.

Network#connectTo awaits Promise.all([libp2p.dial, raceEvent('peer:identify',
filter for BITSWAP_120)]). peer:identify is single-shot per peer, so if the
peer was already identified before bitswap calls connectTo (common when the
host app dials peers via other subsystems first — pubsub, fetch, kad-dht,
manual dial), raceEvent waits for an event that will never re-fire and
connectTo hangs indefinitely. This in turn blocks findAndConnect's
find-providers fallback, so wantBlock times out instead of resolving from
already-connected bitswap-speaking peers.

Add a fast path: before the race, check libp2p.peerStore.get(peer). If the
peer is already known and advertises BITSWAP_120, return from a plain dial()
(cheap no-op when already connected). Otherwise fall through to the existing
identify-race code path unchanged.

Safety:
- connectTo callers (findAndConnect at network.ts:319 and 330) discard the
  returned Connection and use connectTo only for its peer-warmup side effect.
- The actual bitswap stream is opened later in sendMessage via
  libp2p.dialProtocol(peerId, BITSWAP_120), which performs its own protocol
  negotiation. So a fast-path-returned connection whose bitswap stream is in
  fact broken is detected and retried by sendMessage — no invariant is
  violated.

Measured effect: in an IPNS-over-pubsub workload where pubsub-mesh warmup
dials the same peers that bitswap later wants providers from, median wantBlock
wall time drops ~17% (3823 -> 3192 ms over n=230 fetches).
@Rinse12 Rinse12 requested a review from a team as a code owner May 14, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant