[feat] Add Qwen3 MoE true-on-policy parity by maocheng23 · Pull Request #30 · radixark/Megatron-LM

maocheng23 · 2026-05-01T07:48:43Z

Summary

Adds the Megatron-LM side of the Qwen3-30B-A3B MoE true-on-policy stack.

This PR is one of three coupled PRs that must land together because they share the qwen3_moe_true_on_policy_v1 contract and were validated as one end-to-end stack.

Companion PRs:

Target

Bit-identical rollout/train logprob parity for Qwen3-30B-A3B MoE under deterministic decode, with differentiable Megatron backward.

SGLang remains the numerical source of truth. Megatron reproduces the SGLang MoE numerics in training, including deterministic routing/top-k and EP reduction behavior required by the contract.

Main Changes

Adds the Qwen3-MoE true-on-policy extension path at the MoE layer boundary.
Uses SGLang fused MoE behavior for the direct parity path and local autograd support for training.
Updates the SGLang fused MoE imports to the rebased moe_runner.triton_utils layout.
Keeps the MoE-specific parity path narrow; generic Megatron router/dispatcher fallback edits and debug-only helpers were cleaned out.
Preserves dense-stack behavior outside the explicit qwen3_moe_true_on_policy_v1 contract.

Validation

Latest rebased EP4 E2E on ion7, Qwen3-30B-A3B, 8x H200, run qwen3-moe-top-ep4-rebased-e2e-260522-ion7-r4:

Ray job: raysubmit_yc4WK8aCKzP4tx1Q
Train: TP=1, EP=4, ETP=1, CP=2
Rollout: rollout_num_gpus=8, rollout_num_gpus_per_engine=4, SGLang TP=4, SGLang EP=4
Runtime confirmed: recompute_logprobs_via_prefill=False, use_rollout_logprobs=False
rollout logprob: -2.8848648071289062e-05
ref logprob: -2.8848648071289062e-05
train logprob: -2.8848648071289062e-05
train/train_rollout_logprob_abs_diff = 0.0
train/train_rollout_kl = 0.0

Local record:
recovery/qwen3_moe_clean/journal/2026-05-22-qwen3-moe-clean-e2e.md

Test Plan

Rebased onto current radixark/Megatron-LM:miles-main stack.
Local syntax/diff sanity checks for touched Python paths.
Remote import/syntax checks in the ion7 container.
EP4 E2E exact-zero logprob validation on ion7.
CI replay for the three-PR stack.
Longer on/off-policy throughput comparison after reviewer signoff on the cleaned deterministic path.

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn> Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com> Co-authored-by: Yi Zhang <1109276519@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- forward.py: single canonical sglang_moe_forward (102 lines) - autograd.py: pure backward wrapper calling shared forward (182 lines) - moe_experts.py: weight-layout adapter only, no forward logic (36 lines) - moe_layer_ext.py: one linear orchestration path (301 lines) - Remove verbose RuntimeError guards, replace with asserts - Remove weight caching (premature optimization) - Consolidate two parallel forward paths into one Net: -574 lines, structurally impossible for forward paths to diverge. Co-authored-by: Cursor <cursoragent@cursor.com>

This was referenced May 1, 2026

[feat] Add Qwen3 MoE true-on-policy parity radixark/miles#1059

Open

[feat] Init true on policy with qwen_moe maocheng23/sglang#3

Closed

maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch from c63c77f to 9c390a1 Compare May 5, 2026 01:51

maocheng23 mentioned this pull request May 7, 2026

[feat] Add SP and PP kernels for qwen_moe true on policy #31

Open

3 tasks

maocheng23 force-pushed the feat/true_on_policy_qwen_dense branch 2 times, most recently from 9546575 to 57258c8 Compare May 18, 2026 06:15

maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch 3 times, most recently from ab103f8 to 9c6c2a9 Compare May 19, 2026 18:44

maocheng23 changed the title ~~[feat] Init true on policy with qwen_moe~~ [feat] Add Qwen3 MoE true-on-policy parity May 21, 2026

maocheng23 and others added 20 commits May 22, 2026 19:02

Improve MoE true-on-policy training path

995fad7

Remove MoE straight-through fallback

e5ab72a

Drop GroupedMLP fallback dtype patch

ef0fd3e

Prune unused MoE parity fallback paths

6a88277

Make true-on-policy MoE hook policy driven

ea3ce92

Clarify true-on-policy MoE dispatch gate

666ad16

Align MoE direct predicate with true-on-policy contract

24088bf

Simplify MoE true-on-policy mode gate

cfe1c38

Drive MoE kernel gate from true-on-policy contract

c32f169

Align MoE contract gate with direct SGLang path

a143c75

Remove true-on-policy debug dump plumbing

dcf88ac

Keep GRPO loss unchanged for MoE parity

5954723

Remove batch-invariant escape hatch

1ef3a0e

Share SGLang MoE forward path

ae93354

maocheng23 added 2 commits May 22, 2026 19:02

Fix SGLang MoE weight adapter init

1e562d7

Update SGLang fused MoE imports

b0679e2

maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch from 7ac2a80 to b0679e2 Compare May 23, 2026 02:47

maocheng23 mentioned this pull request May 23, 2026

[feat] Add Qwen3 MoE true-on-policy parity sgl-project/sglang#24408

Open

6 tasks

maocheng23 marked this pull request as ready for review May 23, 2026 21:22

maocheng23 changed the base branch from feat/true_on_policy_qwen_dense to miles-main May 23, 2026 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add Qwen3 MoE true-on-policy parity#30

[feat] Add Qwen3 MoE true-on-policy parity#30
maocheng23 wants to merge 22 commits into
miles-mainfrom
feat/true_on_policy_qwen_moe

maocheng23 commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maocheng23 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Target

Main Changes

Validation

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maocheng23 commented May 1, 2026 •

edited

Loading