Skip to content

[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769

Open
HowardZorn wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
HowardZorn:qwen3_5_lm
Open

[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769
HowardZorn wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
HowardZorn:qwen3_5_lm

Conversation

@HowardZorn
Copy link
Copy Markdown

What does this PR do?

Add Megatron Bridge support for Qwen3.5 language models (Qwen3.5-LM), which are the text-only component extracted from Qwen3.5-VL. The model architecture is similar to Qwen3-Next (hybrid GDN + standard attention), but the HF checkpoint organization follows the Qwen3.5-VL convention (model.language_model.* prefix instead of model.layers.*). Additionally, the MTP mcore path is renamed from transformer_layer to mtp_model_layer to align with Megatron-Core's actual module naming.

Changelog

  • Add Qwen3_5Bridge for dense Qwen3.5 language models (qwen3_5_bridge.py)
  • Add Qwen3_5MoEBridge for MoE variant of Qwen3.5 language models (qwen3_5_moe_bridge.py)
  • Register both bridges in models/qwen/__init__.py with model types qwen3_5_text and qwen3_5_moe_text
  • Fix MTP parameter paths in qwen3_bridge.py: mtp.layers.*.transformer_layer.*mtp.layers.*.mtp_model_layer.*
  • Fix MTP parameter paths in qwen3_next_bridge.py: same rename as above (13 mappings updated)
  • Add comprehensive unit tests for both bridges (test_qwen35_bridge.py)

Key design decisions

Aspect Qwen3.5-LM Qwen3-Next (reference)
HF param prefix model.language_model.* model.layers.*
Decoder expert format Fused (gate_up_proj) Individual (gate_proj/up_proj)
MTP expert format Individual per-expert Individual per-expert
GDN linear mapping GDNLinearMappingSeparate (4-tensor: qkv/z/b/a) GDNLinearMapping (2-tensor: qkvz/ba)

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests? — Unit tests for both dense and MoE bridges added
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? — No, uses existing dependencies only

Additional Information

  • Qwen3.5-LM is the standalone text model extracted from Qwen3.5-VL, sharing the same hybrid GDN+Attention architecture as Qwen3-Next but with VL-style checkpoint organization (This is because Hugging Face transformers do not accept other state dict organizations).
  • Standalone Qwen3.5-LM is necessary for me and other guys. Please refer to [feature] Add qwen3.5 config + example for LLM only. #2973.

… MTP mcore path

Add Qwen3_5Bridge and Qwen3_5MoEBridge for HF ↔ Megatron-Core
weight conversion of Qwen3.5 language models with hybrid GDN+Attention
architecture. Also align MTP Megatron-Core parameter paths from
transformer_layer to mtp_model_layer in Qwen3 and Qwen3-Next bridges.

Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants