Skip to content

[feat] Add Qwen3 MoE true-on-policy parity#1059

Open
maocheng23 wants to merge 14 commits into
mainfrom
feat/true_on_policy_qwen_moe
Open

[feat] Add Qwen3 MoE true-on-policy parity#1059
maocheng23 wants to merge 14 commits into
mainfrom
feat/true_on_policy_qwen_moe

Conversation

@maocheng23
Copy link
Copy Markdown
Contributor

@maocheng23 maocheng23 commented May 1, 2026

Summary

Adds the Miles side of the Qwen3-30B-A3B MoE true-on-policy stack on top of radixark/miles:main.

This PR is one of three coupled PRs that must land together because they share the qwen3_moe_true_on_policy_v1 contract and were validated as one end-to-end stack.

Companion PRs:

Target

Bit-identical rollout/train logprob parity for Qwen3-30B-A3B MoE under deterministic decode.

SGLang remains the numerical source of truth. Miles owns the launch-plan contract, topology validation, and environment/argument plumbing that keeps SGLang rollout and Megatron training on the same policy.

Main Changes

  • Adds the qwen3_moe_true_on_policy_v1 contract and Qwen3-MoE launch profile.
  • Wires deterministic SGLang decode args, EP sizing, CP layout, and MoE contract env vars through the true-on-policy launch plan.
  • Keeps the deterministic correctness path; prefill-only fast-decode/recompute and debug/benchmark-only changes were cleaned out.
  • Preserves upstream custom-allreduce defaults while disabling the incompatible path only under the deterministic true-on-policy contract.
  • Updates logprob/loss plumbing for current upstream Miles APIs, including CP slicing with the new ParallelState.cp.rank/size shape.

Validation

Latest rebased EP4 E2E on ion7, Qwen3-30B-A3B, 8x H200, run qwen3-moe-top-ep4-rebased-e2e-260522-ion7-r4:

  • Ray job: raysubmit_yc4WK8aCKzP4tx1Q
  • Train: TP=1, EP=4, ETP=1, CP=2
  • Rollout: rollout_num_gpus=8, rollout_num_gpus_per_engine=4, SGLang TP=4, SGLang EP=4
  • Runtime confirmed: recompute_logprobs_via_prefill=False, use_rollout_logprobs=False
  • rollout logprob: -2.8848648071289062e-05
  • ref logprob: -2.8848648071289062e-05
  • train logprob: -2.8848648071289062e-05
  • train/train_rollout_logprob_abs_diff = 0.0
  • train/train_rollout_kl = 0.0

Local record:
recovery/qwen3_moe_clean/journal/2026-05-22-qwen3-moe-clean-e2e.md

Test Plan

  • Rebased onto current upstream Miles main.
  • Local syntax/diff sanity checks for touched Python paths.
  • Remote import/syntax checks in the ion7 container.
  • EP4 E2E exact-zero logprob validation on ion7.
  • CI replay for the three-PR stack.
  • Longer on/off-policy throughput comparison after reviewer signoff on the cleaned deterministic path.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Qwen3 MoE true-on-policy training, adding specialized parallel layouts, kernel policies, and model profiles. Key features include weight and gradient auditing for debugging, optimized weight synchronization for expert parallelism, and improved prefill logprob recomputation using full-sequence scoring and batching. Feedback is provided regarding a logic error in the weight auditing sampling function where small tensors result in tripled statistics due to overlapping slices during concatenation.

Comment on lines +116 to +124
sample_size = min(4096, numel)
midpoint = max(0, (numel - sample_size) // 2)
sample = torch.cat(
[
flat[:sample_size],
flat[midpoint : midpoint + sample_size],
flat[-sample_size:],
]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current sampling logic for weight auditing can lead to incorrect statistics for small tensors. When numel <= 4096, sample_size becomes numel, and the torch.cat operation results in sample containing three copies of the original flat tensor. This will cause sample_sum to be three times the actual sum, skewing the audit results. To fix this, handle small tensors as a special case and retrieve the sample size from the model configuration instead of hardcoding it.

            sample_size = config.audit_sample_size
            if numel <= sample_size:
                sample = flat
            else:
                midpoint = max(0, (numel - sample_size) // 2)
                sample = torch.cat(
                    [
                        flat[:sample_size],
                        flat[midpoint : midpoint + sample_size],
                        flat[-sample_size:],
                    ]
                )
References
  1. Model parameters, such as index_topk, should be retrieved from the model configuration rather than being hardcoded.

@maocheng23 maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch from c8fc984 to eececd6 Compare May 5, 2026 01:51
@maocheng23 maocheng23 force-pushed the feat/true_on_policy_qwen_dense branch 2 times, most recently from 2047f7d to 798b791 Compare May 18, 2026 06:15
Base automatically changed from feat/true_on_policy_qwen_dense to main May 18, 2026 06:21
@maocheng23 maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch 2 times, most recently from 8a740d7 to 2cd7533 Compare May 19, 2026 18:22
maocheng23 and others added 14 commits May 22, 2026 18:42
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n flow

Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: zju-stu-lizheng <lizheng.cs@zju.edu.cn>
Co-authored-by: zyxiyy02 <282300612+zyxiyy02@users.noreply.github.com>
Co-authored-by: Yi Zhang <1109276519@qq.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@maocheng23 maocheng23 force-pushed the feat/true_on_policy_qwen_moe branch from fe23383 to e52a170 Compare May 23, 2026 02:47
@maocheng23 maocheng23 changed the title [feat] Init true on policy with qwen_moe [feat] Add Qwen3 MoE true-on-policy parity May 23, 2026
@maocheng23 maocheng23 marked this pull request as ready for review May 23, 2026 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant