Skip to content

Scope the round-robin load balancer's rotation lock per worker so dispatch throughput scales with worker count #263

Description

@conradbzura

Description

RoundRobinLoadBalancer.dispatch holds a single global asyncio.Lock across await connection.dispatch(...), so the dispatch/handshake phase of every concurrent dispatch is serialized through one lock. Dispatch throughput therefore does not scale with worker count — it is flat-to-declining as workers are added.

Multi-worker prototype measurement (round-robin, empty payload, 64 concurrent dispatches, Python connection, ops/s):

lock N=1 N=2 N=4 scaling 1→4
global (current) 1470 1327 1175 0.80×
per-worker 1492 2140 2985 2.00×

Adding workers under the current lock makes throughput worse; a per-worker lock scales ~2× at 4 workers (~2.5× the global lock's N=4 throughput).

Expected behavior

Adding workers increases dispatch throughput (scaling toward N× until a caller-side limit is reached), while preserving the load balancer's existing anti-thundering-herd guarantee: a burst of concurrent dispatches must not stampede a single worker.

Root cause

The lock is deliberate load-shaping, not merely rotation-index protection. The rotation index only advances on success after the handshake, so without serialization a burst of concurrent dispatches all read the same stale index and target the same worker (thundering herd). One global lock held across the handshake prevents that — but over-broadly: it also serializes dispatches to different workers, which need not be serialized.

The fix is a per-worker lock: hold a brief lock only to advance the rotation index, then acquire a lock scoped to the selected worker across its handshake. Dispatches to different workers overlap; dispatches to the same worker serialize — cross-worker parallelism with the anti-herd guarantee intact.

Narrowing the lock changes the concurrency of index advancement, so exhaustion detection must be reworked. The current per-call checkpoint relies on rotation-index identity to detect "tried every worker"; once concurrent dispatches share and advance the index, a single dispatch can skip or miss workers under eviction with multiple workers. Track a per-dispatch tried-set of attempted worker uids rather than index identity.

Note: increased handshake concurrency can surface a separate worker-side proxy_pool "Lock is bound to a different event loop" race — validate against it (or file separately) when landing this.

Metadata

Metadata

Assignees

Labels

refactorCode restructuring without behavior change

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions