Qwen 3.5 with disaggregated setup appears to not work with R3, and fails on NIXL timeouts.
Example: On the newest miles sglang router (manually installed as it is not part of latest miles), and latest miles docker, run the simple example in #1062 with R3 enabled.
I see some recent fixes about R3 were connected to overlap_schedule. When using mamba_scheduler_strategy='extra_buffer' to avoid Disabling overlap schedule since mamba no_buffer is not compatible with overlap schedule, try to use --disable-radix-cache if overlap schedule is necessary we still get NIXL timeouts.
[2026-05-06 02:40:42 DP1 TP1 EP1] Prefill batch, #new-seq: 1, #new-token: 1808, #cached-token: 54784, full token usage: 0.18, mamba usage: 0.04, #running-req: 0, #queue-req: 5, #prealloc-req: 0, #inflight-req: 2, cuda graph: False, input throughput (token/s): 4757.38
[2026-05-06 02:40:42 DP3 TP3 EP3] Prefill batch, #new-seq: 3, #new-token: 4178, #cached-token: 169856, full token usage: 0.16, mamba usage: 0.04, #running-req: 0, #queue-req: 12, #prealloc-req: 1, #inflight-req: 4, cuda graph: False, input throughput (token/s): 5188.67
[2026-05-06 02:40:42 DP0 TP0 EP0] Prefill batch, #new-seq: 3, #new-token: 4197, #cached-token: 173696, full token usage: 0.26, mamba usage: 0.06, #running-req: 0, #queue-req: 7, #prealloc-req: 0, #inflight-req: 4, cuda graph: False, input throughput (token/s): 5043.24
[2026-05-06 02:40:42 DP2 TP2 EP2] Prefill batch, #new-seq: 5, #new-token: 13343, #cached-token: 275840, full token usage: 0.28, mamba usage: 0.07, #running-req: 0, #queue-req: 9, #prealloc-req: 1, #inflight-req: 6, cuda graph: False, input throughput (token/s): 4535.31
[2026-05-06 02:40:42 DP0 TP0 EP0] Decode batch, #running-req: 6, #full token: 2822647, full token usage: 0.55, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.48, #prealloc-req: 0, #transfer-req: 52, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 429.13, #queue-req: 0
[2026-05-06 02:40:42] INFO: 10.125.4.245:36080 - "POST /generate HTTP/1.1" 200 OK
[2026-05-06 02:40:42 DP2 TP2 EP2] Decode batch, #running-req: 10, #full token: 2863674, full token usage: 0.56, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 48, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 370.94, #queue-req: 0
[2026-05-06 02:40:42 DP3 TP3 EP3] Decode batch, #running-req: 8, #full token: 2739238, full token usage: 0.54, mamba num: 55, mamba usage: 0.43, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 47, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 593.71, #queue-req: 0
[2026-05-06 02:40:42 DP1 TP1 EP1] Decode batch, #running-req: 9, #full token: 2766160, full token usage: 0.54, mamba num: 59, mamba usage: 0.46, pre-allocated usage: 0.44, #prealloc-req: 0, #transfer-req: 50, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 768.37, #queue-req: 0
[2026-05-06 02:40:42 DP0 TP0 EP0] Decode batch, #running-req: 9, #full token: 2823004, full token usage: 0.55, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 49, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 764.55, #queue-req: 0
[2026-05-06 02:40:42 DP2 TP2 EP2] Decode batch, #running-req: 10, #full token: 2864074, full token usage: 0.56, mamba num: 58, mamba usage: 0.45, pre-allocated usage: 0.45, #prealloc-req: 0, #transfer-req: 48, #retracted-req: 0, cuda graph: True, gen throughput (token/s): 915.84, #queue-req: 0
...
[2026-05-06 02:45:18 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='fd0749cac0a54368a4f71a77d7b60cca' decode_req.req.bootstrap_room=6963585562773095466 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:27 DP2 TP2 EP2] Decode transfer failed for request rank=2 decode_req.req.rid='26d60c156f7b47fd94c0bf82e5f566e7' decode_req.req.bootstrap_room=438127910756900784 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:27 DP2 TP2 EP2] Decode transfer failed for request rank=2 decode_req.req.rid='3868e00f6701418db14320cb265d92ce' decode_req.req.bootstrap_room=1339616172398772338 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:27 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='e54d5661134e48fea528d9d32877aa44' decode_req.req.bootstrap_room=7339194725951020375 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:38 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='f9713dfe0f254963a9d12aab96f94632' decode_req.req.bootstrap_room=2523274887503196564 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:41 DP2 TP2 EP2] Decode transfer failed for request rank=2 decode_req.req.rid='e57cc32c345e46b19c07f27a463c3f46' decode_req.req.bootstrap_room=221278952829847220 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:52 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='2dcecbd365684ef9aed91aa587d2e088' decode_req.req.bootstrap_room=4620795948407138890 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:55 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='6fc8d76eaa554329a540303e4a41df98' decode_req.req.bootstrap_room=6704161289435070234 with exception NIXL KVReceiver Exception
[2026-05-06 02:45:57 DP3 TP3 EP3] Decode transfer failed for request rank=3 decode_req.req.rid='3f3ef732a944400aadbc9af2a94f5485' decode_req.req.bootstrap_room=3275357675393760023 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:08 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='b72983577b35407e8c006db2fe3b6ec1' decode_req.req.bootstrap_room=1445635835666031729 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:08 DP0 TP0 EP0] Decode transfer failed for request rank=0 decode_req.req.rid='ca1572ba5a8e41b5b92613f1e974ed2b' decode_req.req.bootstrap_room=2814061672592194676 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:11 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='5758f0359fd74b7bac0831fa3bba81b1' decode_req.req.bootstrap_room=4929542802965980655 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:22 DP1 TP1 EP1] Decode transfer failed for request rank=1 decode_req.req.rid='f74b410e8446430aa401ba340bf598a2' decode_req.req.bootstrap_room=9137660748280382090 with exception NIXL KVReceiver Exception
[2026-05-06 02:46:23 DP3 TP3 EP3] Decode transfer failed for request rank=3 decode_req.req.rid='95bf75bbe7a6407e814ce9e89538ad14' decode_req.req.bootstrap_room=8811760169380349256 with exception NIXL KVReceiver Exception
EDIT: This does not require a weight sync, simply SGLang times out. One theory may be that there's too much overhead induced by sending the expert data via ZMQ or HTTP however I don't have a proof of that and it happens at relatively low concurrency (16 per GPU) as well.
Qwen 3.5 with disaggregated setup appears to not work with R3, and fails on NIXL timeouts.
Example: On the newest miles sglang router (manually installed as it is not part of latest miles), and latest miles docker, run the simple example in #1062 with R3 enabled.
I see some recent fixes about R3 were connected to overlap_schedule. When using
mamba_scheduler_strategy='extra_buffer'to avoidDisabling overlap schedule since mamba no_buffer is not compatible with overlap schedule, try to use --disable-radix-cache if overlap schedule is necessarywe still get NIXL timeouts.Log summary:
EDIT: This does not require a weight sync, simply SGLang times out. One theory may be that there's too much overhead induced by sending the expert data via ZMQ or HTTP however I don't have a proof of that and it happens at relatively low concurrency (16 per GPU) as well.