R3 not working for Qwen3.5 PD disaggregated

Qwen 3.5 with disaggregated setup appears to not work with R3, and fails on NIXL timeouts.

Example: On the newest miles sglang router (manually installed as it is not part of latest miles), and latest miles docker, run the simple example in https://github.com/radixark/miles/pull/1062 with R3 enabled.

I see some recent fixes about R3 were connected to overlap_schedule. When using `mamba_scheduler_strategy='extra_buffer'` to avoid ` Disabling overlap schedule since mamba no_buffer is not compatible with overlap schedule, try to use --disable-radix-cache if overlap schedule is necessary` we still get NIXL timeouts.

Log summary:
```
[2026-05-06 02:40:42 DP1 TP1 EP1] Prefill batch, #new-seq: 1, #new-token: 1808, #cached-token: 54784, full token usage: 0.18, mamba usage: 0.04, #running-req: 0, #queue-req: 5, #prealloc-req: 0, #inflight-req: 2, cuda graph: False, input throughput (token/s): 4757.38
[2026-05-06 02:40:42 DP3 TP3 EP3] Prefill batch, #new-seq: 3, #new-token: 4178, #cached-token: 169856, full token usage: 0.16, mamba usage: 0.04, #running-req: 0, #queue-req: 12, #prealloc-req: 1, #inflight-req: 4, cuda graph: False, input throughput (token/s): 5188.67
[2026-05-06 02:40:42 DP0 TP0 EP0] Prefill batch, #new-seq: 3, #new-token: 4197, #cached-token: 173696, full token usage: 0.26, mamba usage: 0.06, #running-req: 0, #queue-req: 7, #prealloc-req: 0, #inflight-req: 4, cuda graph: False, input throughput (token/s): 5043.24
[2026-05-06 02:40:42 DP2 TP2 EP2] Prefill batch, #new-seq: 5, #new-token: 13343, #cached-token: 275840, full token usage: 0.28, mamba usage: 0.07, #running-req: 0, #queue-req: 9, #prealloc-req: 1, #inflight-req: 6, cuda graph: False, input throughput (token/s): 4535.31
[2026-05-06 02:40:42 DP0 TP0 EP0] Decode batch, #running-req: 6, #full token: 2822647, full token usage: 0.55, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.48, #prealloc-req: 0, #transfer-req: 52, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 429.13, #queue-req: 0
[2026-05-06 02:40:42] INFO: 10.125.4.245:36080 - "POST /generate HTTP/1.1" 200 OK
[2026-05-06 02:40:42 DP2 TP2 EP2] Decode batch, #running-req: 10, #full token: 2863674, full token usage: 0.56, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 48, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 370.94, #queue-req: 0
[2026-05-06 02:40:42 DP3 TP3 EP3] Decode batch, #running-req: 8, #full token: 2739238, full token usage: 0.54, mamba num: 55, mamba usage: 0.43, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 47, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 593.71, #queue-req: 0
[2026-05-06 02:40:42 DP1 TP1 EP1] Decode batch, #running-req: 9, #full token: 2766160, full token usage: 0.54, mamba num: 59, mamba usage: 0.46, pre-allocated usage: 0.44, #prealloc-req: 0, #transfer-req: 50, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 768.37, #queue-req: 0
[2026-05-06 02:40:42 DP0 TP0 EP0] Decode batch, #running-req: 9, #full token: 2823004, full token usage: 0.55, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 49, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 764.55, #queue-req: 0
[2026-05-06 02:40:42 DP2 TP2 EP2] Decode batch, #running-req: 10, #full token: 2864074, full token usage: 0.56, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 48, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 915.84, #queue-req: 0
...
[2026-05-06 02:45:18 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='fd0749cac0a54368a4f71a77d7b60cca' decode_req.req.bootstrap_room=6963585562773095466 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:27 DP2 TP2 EP2] Decode transfer failed for request rank=2 decode_req.req.rid='26d60c156f7b47fd94c0bf82e5f566e7' decode_req.req.bootstrap_room=438127910756900784 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:27 DP2 TP2 EP2] Decode transfer failed for request rank=2 decode_req.req.rid='3868e00f6701418db14320cb265d92ce' decode_req.req.bootstrap_room=1339616172398772338 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:27 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='e54d5661134e48fea528d9d32877aa44' decode_req.req.bootstrap_room=7339194725951020375 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:38 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='f9713dfe0f254963a9d12aab96f94632' decode_req.req.bootstrap_room=2523274887503196564 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:41 DP2 TP2 EP2] Decode transfer failed for request rank=2 decode_req.req.rid='e57cc32c345e46b19c07f27a463c3f46' decode_req.req.bootstrap_room=221278952829847220 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:52 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='2dcecbd365684ef9aed91aa587d2e088' decode_req.req.bootstrap_room=4620795948407138890 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:55 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='6fc8d76eaa554329a540303e4a41df98' decode_req.req.bootstrap_room=6704161289435070234 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:57 DP3 TP3 EP3] Decode transfer failed for request rank=3 decode_req.req.rid='3f3ef732a944400aadbc9af2a94f5485' decode_req.req.bootstrap_room=3275357675393760023 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:08 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='b72983577b35407e8c006db2fe3b6ec1' decode_req.req.bootstrap_room=1445635835666031729 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:08 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='ca1572ba5a8e41b5b92613f1e974ed2b' decode_req.req.bootstrap_room=2814061672592194676 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:11 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='5758f0359fd74b7bac0831fa3bba81b1' decode_req.req.bootstrap_room=4929542802965980655 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:22 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='f74b410e8446430aa401ba340bf598a2' decode_req.req.bootstrap_room=9137660748280382090 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:23 DP3 TP3 EP3] Decode transfer failed for request rank=3 decode_req.req.rid='95bf75bbe7a6407e814ce9e89538ad14' decode_req.req.bootstrap_room=8811760169380349256 with exception NIXL KVReceiver Exception

```

EDIT: This does not require a weight sync, simply SGLang times out. One theory may be that there's too much overhead induced by sending the expert data via ZMQ or HTTP however I don't have a proof of that and it happens at relatively low concurrency (16 per GPU) as well.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R3 not working for Qwen3.5 PD disaggregated #1085

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

R3 not working for Qwen3.5 PD disaggregated #1085

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions