Scope the round-robin load balancer's rotation lock per worker so dispatch throughput scales with worker count

## Description

`RoundRobinLoadBalancer.dispatch` holds a single global `asyncio.Lock` across `await connection.dispatch(...)`, so the dispatch/handshake phase of every concurrent dispatch is serialized through one lock. Dispatch throughput therefore does not scale with worker count — it is flat-to-declining as workers are added.

Multi-worker prototype measurement (round-robin, empty payload, 64 concurrent dispatches, Python connection, ops/s):

| lock | N=1 | N=2 | N=4 | scaling 1→4 |
|---|---|---|---|---|
| global (current) | 1470 | 1327 | 1175 | 0.80× |
| per-worker | 1492 | 2140 | 2985 | 2.00× |

Adding workers under the current lock makes throughput *worse*; a per-worker lock scales ~2× at 4 workers (~2.5× the global lock's N=4 throughput).

## Expected behavior

Adding workers increases dispatch throughput (scaling toward N× until a caller-side limit is reached), while preserving the load balancer's existing anti-thundering-herd guarantee: a burst of concurrent dispatches must not stampede a single worker.

## Root cause

The lock is deliberate load-shaping, not merely rotation-index protection. The rotation index only advances on success *after* the handshake, so without serialization a burst of concurrent dispatches all read the same stale index and target the same worker (thundering herd). One global lock held across the handshake prevents that — but over-broadly: it also serializes dispatches to *different* workers, which need not be serialized.

The fix is a **per-worker lock**: hold a brief lock only to advance the rotation index, then acquire a lock scoped to the *selected worker* across its handshake. Dispatches to different workers overlap; dispatches to the same worker serialize — cross-worker parallelism with the anti-herd guarantee intact.

Narrowing the lock changes the concurrency of index advancement, so exhaustion detection must be reworked. The current per-call `checkpoint` relies on rotation-index identity to detect "tried every worker"; once concurrent dispatches share and advance the index, a single dispatch can skip or miss workers under eviction with multiple workers. Track a per-dispatch tried-set of attempted worker uids rather than index identity.

Note: increased handshake concurrency can surface a separate worker-side `proxy_pool` "Lock is bound to a different event loop" race — validate against it (or file separately) when landing this.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scope the round-robin load balancer's rotation lock per worker so dispatch throughput scales with worker count #263

Description

Expected behavior

Root cause

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

lock	N=1	N=2	N=4	scaling 1→4
global (current)	1470	1327	1175	0.80×
per-worker	1492	2140	2985	2.00×

Uh oh!

Scope the round-robin load balancer's rotation lock per worker so dispatch throughput scales with worker count #263

Description

Description

Expected behavior

Root cause

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions