[model] feat: add Qwen3.5 text model bridges (dense + MoE) by HowardZorn · Pull Request #3769 · NVIDIA-NeMo/Megatron-Bridge

HowardZorn · 2026-05-10T11:40:38Z

What does this PR do?

Add Megatron Bridge support for Qwen3.5 language models (Qwen3.5-LM), which are the text-only component extracted from Qwen3.5-VL. The model architecture is similar to Qwen3-Next (hybrid GDN + standard attention), but the HF checkpoint organization follows the Qwen3.5-VL convention (model.language_model.* prefix instead of model.layers.*). Additionally, the MTP mcore path is renamed from transformer_layer to mtp_model_layer to align with Megatron-Core's actual module naming.

Changelog

Add Qwen3_5Bridge for dense Qwen3.5 language models (qwen3_5_bridge.py)
Add Qwen3_5MoEBridge for MoE variant of Qwen3.5 language models (qwen3_5_moe_bridge.py)
Register both bridges in models/qwen/__init__.py with model types qwen3_5_text and qwen3_5_moe_text
Fix MTP parameter paths in qwen3_bridge.py: mtp.layers.*.transformer_layer.* → mtp.layers.*.mtp_model_layer.*
Fix MTP parameter paths in qwen3_next_bridge.py: same rename as above (13 mappings updated)
Add comprehensive unit tests for both bridges (test_qwen35_bridge.py)

Key design decisions

Aspect	Qwen3.5-LM	Qwen3-Next (reference)
HF param prefix	`model.language_model.*`	`model.layers.*`
Decoder expert format	Fused (`gate_up_proj`)	Individual (`gate_proj`/`up_proj`)
MTP expert format	Individual per-expert	Individual per-expert
GDN linear mapping	`GDNLinearMappingSeparate` (4-tensor: qkv/z/b/a)	`GDNLinearMapping` (2-tensor: qkvz/ba)

GitHub Actions CI

See the CI section in the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests? — Unit tests for both dense and MoE bridges added
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? — No, uses existing dependencies only

Additional Information

Qwen3.5-LM is the standalone text model extracted from Qwen3.5-VL, sharing the same hybrid GDN+Attention architecture as Qwen3-Next but with VL-style checkpoint organization (This is because Hugging Face transformers do not accept other state dict organizations).
Standalone Qwen3.5-LM is necessary for me and other guys. Please refer to [feature] Add qwen3.5 config + example for LLM only. #2973.

… MTP mcore path Add Qwen3_5Bridge and Qwen3_5MoEBridge for HF ↔ Megatron-Core weight conversion of Qwen3.5 language models with hybrid GDN+Attention architecture. Also align MTP Megatron-Core parameter paths from transformer_layer to mtp_model_layer in Qwen3 and Qwen3-Next bridges. Signed-off-by: He Ruozhou <heruozhou@xiaohongshu.com>

copy-pr-bot · 2026-05-10T11:40:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions Bot added the community-request label May 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769

[model] feat: add Qwen3.5 text model bridges (dense + MoE)#3769
HowardZorn wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
HowardZorn:qwen3_5_lm

HowardZorn commented May 10, 2026

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HowardZorn commented May 10, 2026

What does this PR do?

Changelog

Key design decisions

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants