[feat] Add Qwen3 MoE true-on-policy parity by maocheng23 · Pull Request #1059 · radixark/miles

maocheng23 · 2026-05-01T07:49:36Z

Summary

Adds the Miles side of the Qwen3-30B-A3B MoE true-on-policy stack on top of radixark/miles:main.

This PR is one of three coupled PRs that must land together because they share the qwen3_moe_true_on_policy_v1 contract and were validated as one end-to-end stack.

Companion PRs:

SGLang: [feat] Add Qwen3 MoE true-on-policy parity sgl-project/sglang#24408
Megatron-LM: [feat] Add Qwen3 MoE true-on-policy parity Megatron-LM#30

Target

Bit-identical rollout/train logprob parity for Qwen3-30B-A3B MoE under deterministic decode.

SGLang remains the numerical source of truth. Miles owns the launch-plan contract, topology validation, and environment/argument plumbing that keeps SGLang rollout and Megatron training on the same policy.

Main Changes

Adds the qwen3_moe_true_on_policy_v1 contract and Qwen3-MoE launch profile.
Wires deterministic SGLang decode args, EP sizing, CP layout, and MoE contract env vars through the true-on-policy launch plan.
Keeps the deterministic correctness path; prefill-only fast-decode/recompute and debug/benchmark-only changes were cleaned out.
Preserves upstream custom-allreduce defaults while disabling the incompatible path only under the deterministic true-on-policy contract.
Updates logprob/loss plumbing for current upstream Miles APIs, including CP slicing with the new ParallelState.cp.rank/size shape.

Validation

Latest rebased EP4 E2E on ion7, Qwen3-30B-A3B, 8x H200, run qwen3-moe-top-ep4-rebased-e2e-260522-ion7-r4:

Ray job: raysubmit_yc4WK8aCKzP4tx1Q
Train: TP=1, EP=4, ETP=1, CP=2
Rollout: rollout_num_gpus=8, rollout_num_gpus_per_engine=4, SGLang TP=4, SGLang EP=4
Runtime confirmed: recompute_logprobs_via_prefill=False, use_rollout_logprobs=False
rollout logprob: -2.8848648071289062e-05
ref logprob: -2.8848648071289062e-05
train logprob: -2.8848648071289062e-05
train/train_rollout_logprob_abs_diff = 0.0
train/train_rollout_kl = 0.0

Local record:
recovery/qwen3_moe_clean/journal/2026-05-22-qwen3-moe-clean-e2e.md

Test Plan

Rebased onto current upstream Miles main.
Local syntax/diff sanity checks for touched Python paths.
Remote import/syntax checks in the ion7 container.
EP4 E2E exact-zero logprob validation on ion7.
CI replay for the three-PR stack.
Longer on/off-policy throughput comparison after reviewer signoff on the cleaned deterministic path.

gemini-code-assist

Code Review

This pull request introduces support for Qwen3 MoE true-on-policy training, adding specialized parallel layouts, kernel policies, and model profiles. Key features include weight and gradient auditing for debugging, optimized weight synchronization for expert parallelism, and improved prefill logprob recomputation using full-sequence scoring and batching. Feedback is provided regarding a logic error in the weight auditing sampling function where small tensors result in tripled statistics due to overlapping slices during concatenation.

gemini-code-assist · 2026-05-01T07:54:34Z

+            sample_size = min(4096, numel)
+            midpoint = max(0, (numel - sample_size) // 2)
+            sample = torch.cat(
+                [
+                    flat[:sample_size],
+                    flat[midpoint : midpoint + sample_size],
+                    flat[-sample_size:],
+                ]
+            )


The current sampling logic for weight auditing can lead to incorrect statistics for small tensors. When numel <= 4096, sample_size becomes numel, and the torch.cat operation results in sample containing three copies of the original flat tensor. This will cause sample_sum to be three times the actual sum, skewing the audit results. To fix this, handle small tensors as a special case and retrieve the sample size from the model configuration instead of hardcoding it.

sample_size = config.audit_sample_size if numel <= sample_size: sample = flat else: midpoint = max(0, (numel - sample_size) // 2) sample = torch.cat( [ flat[:sample_size], flat[midpoint : midpoint + sample_size], flat[-sample_size:], ] )

References

Model parameters, such as index_topk, should be retrieved from the model configuration rather than being hardcoded.

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n flow Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This was referenced May 1, 2026

[feat] Init true on policy with qwen_moe maocheng23/sglang#3

Closed

[feat] Add Qwen3 MoE true-on-policy parity radixark/Megatron-LM#30

Open

gemini-code-assist Bot reviewed May 1, 2026

View reviewed changes

maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch from c8fc984 to eececd6 Compare May 5, 2026 01:51

maocheng23 mentioned this pull request May 7, 2026

[feat] Add SP and PP support for qwen_moe true on policy #1088

Open

5 tasks

maocheng23 force-pushed the feat/true_on_policy_qwen_dense branch 2 times, most recently from 2047f7d to 798b791 Compare May 18, 2026 06:15

Base automatically changed from feat/true_on_policy_qwen_dense to main May 18, 2026 06:21

maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch 2 times, most recently from 8a740d7 to 2cd7533 Compare May 19, 2026 18:22

maocheng23 and others added 14 commits May 22, 2026 18:42

Add dense true-on-policy e2e CI gate

b4dc21c

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wire Qwen3 MoE true-on-policy in Miles

8f2d667

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add MoE true-on-policy fast decode flag

1664eb9

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Export launch env before model arg expansion

5218dd0

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove legacy SGLang on-policy target wiring

330da61

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wire MoE true-on-policy rollout environment

b835778

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Clean up MoE true-on-policy deterministic path

ee81633

Finalize Qwen3 MoE deterministic debug path

b9d3b6f

Fix CP slicing parallel state access

308febe

Remove stale policy loss debug dump hook

e52a170

maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch from fe23383 to e52a170 Compare May 23, 2026 02:47

maocheng23 changed the title ~~[feat] Init true on policy with qwen_moe~~ [feat] Add Qwen3 MoE true-on-policy parity May 23, 2026

maocheng23 mentioned this pull request May 23, 2026

[feat] Add Qwen3 MoE true-on-policy parity sgl-project/sglang#24408

Open

6 tasks

maocheng23 marked this pull request as ready for review May 23, 2026 21:22

maocheng23 requested review from fzyzcjy and yueming-yuan as code owners May 23, 2026 21:22

maocheng23 requested review from Zhichenzzz, guapisolo, jybsuper and yushengsu-thu as code owners May 23, 2026 21:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add Qwen3 MoE true-on-policy parity#1059

[feat] Add Qwen3 MoE true-on-policy parity#1059
maocheng23 wants to merge 14 commits into
mainfrom
feat/true_on_policy_qwen_moe

maocheng23 commented May 1, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maocheng23 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Target

Main Changes

Validation

Test Plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maocheng23 commented May 1, 2026 •

edited

Loading