Skip to content

Add Qwen3.5-MoE (35B-A3B) model support#2146

Open
tanzeel-amd wants to merge 11 commits into
microsoft:mainfrom
tanzeel-amd:turrahma/qwen3.5-moe-support
Open

Add Qwen3.5-MoE (35B-A3B) model support#2146
tanzeel-amd wants to merge 11 commits into
microsoft:mainfrom
tanzeel-amd:turrahma/qwen3.5-moe-support

Conversation

@tanzeel-amd
Copy link
Copy Markdown

Model builder and runtime support for Qwen3.5-MoE

  • Added Qwen35MoeTextModel builder class for Qwen3_5MoeForConditionalGeneration architecture with 256 experts, shared expert, and SwiGLU activation
  • Registered qwen3_5_moe as VLM type in C++ runtime (model_type.h, model.cpp)
  • Added architecture dispatch in builder.py for Qwen3_5MoeForConditionalGeneration
  • Key implementation details:
    • Repacks HF concatenated gate_up_proj to ORT interleaved format (swiglu_fusion=1)
    • Shared expert implemented as separate SiLU MLP path with sigmoid gating
    • Router uses bias-free MatMul matching Qwen3_5MoeTopKRouter
    • QMoE symmetric blockwise quantization without explicit zero_points

Copilot AI review requested due to automatic review settings May 8, 2026 10:25
@tanzeel-amd tanzeel-amd requested a review from a team as a code owner May 8, 2026 10:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds builder + runtime plumbing for Qwen3.5-MoE export/inference, integrating a new MoE-capable Qwen3.5 builder into the Python model builder and registering the corresponding model type in the C++ runtime so the correct multi-modal processor/model-family behaviors can be selected at runtime.

Changes:

  • Introduces Qwen35MoeTextModel in the Python model builder with fused MoE/QMoE graph construction (router + routed experts + shared expert).
  • Adds builder dispatch/import wiring so Qwen3_5MoeForConditionalGeneration can be exported via builder.py.
  • Registers qwen3_5_moe in the C++ runtime as a VLM/Qwen-VL-family type and wires it to the existing Qwen image processor.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/python/py/models/builders/qwen.py Adds Qwen35MoeTextModel and fused MoE/QMoE subgraph generation for Qwen3.5-MoE.
src/python/py/models/builders/init.py Exports Qwen35MoeTextModel from the builders package.
src/python/py/models/builder.py Adds architecture dispatch for Qwen3_5MoeForConditionalGeneration.
src/models/model.cpp Registers qwen3_5_moe with MultiModalProcessor’s processor factory.
src/models/model_type.h Adds qwen3_5_moe to VLM and Qwen-VL-family classification helpers.

Comment thread src/python/py/models/builders/qwen.py
@tanzeel-amd
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="AMD"

@VishalX
Copy link
Copy Markdown
Contributor

VishalX commented May 18, 2026

@kunal-vaishnavi pls review this. Olive recipes are added here: microsoft/olive-recipes#405

@baijumeswani
Copy link
Copy Markdown
Collaborator

@tanzeel-amd could you please address copilot review comments if relevant?

@tanzeel-amd tanzeel-amd force-pushed the turrahma/qwen3.5-moe-support branch from 9d80321 to 0be688b Compare May 19, 2026 08:21
@tanzeel-amd
Copy link
Copy Markdown
Author

@baijumeswani resolved the copilot comments. Please review.

@VishalX
Copy link
Copy Markdown
Contributor

VishalX commented May 19, 2026

@baijumeswani resolved the copilot comments. Please review.

@kunal-vaishnavi

Comment thread src/models/model_type.h Outdated
Comment thread src/models/model_type.h Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Outdated
Comment thread src/python/py/models/builders/qwen.py Fixed
Comment thread src/python/py/models/builders/qwen.py Dismissed
Comment thread src/python/py/models/builders/base.py Outdated
Comment thread src/python/py/models/builders/base.py Outdated
@baijumeswani
Copy link
Copy Markdown
Collaborator

@tanzeel-amd could you please resolve the merge conflict?

@tanzeel-amd
Copy link
Copy Markdown
Author

Resolved conflicts @baijumeswani

@baijumeswani baijumeswani enabled auto-merge (squash) May 22, 2026 18:04
Comment thread src/python/py/models/builders/qwen.py Dismissed
- Added Qwen35MoeTextModel builder class for Qwen3_5MoeForConditionalGeneration
  architecture with 256 experts, shared expert, and SwiGLU activation
- Registered qwen3_5_moe as VLM type in C++ runtime (model_type.h, model.cpp)
- Added architecture dispatch in builder.py for Qwen3_5MoeForConditionalGeneration
- Key implementation details:
  - Repacks HF concatenated gate_up_proj to ORT interleaved format (swiglu_fusion=1)
  - Shared expert implemented as separate SiLU MLP path with sigmoid gating
  - Router uses bias-free MatMul matching Qwen3_5MoeTopKRouter
  - QMoE symmetric blockwise quantization without explicit zero_points
- Also includes existing gemma.py rope_local_base_freq fix for TranslateGemma
tanzeel-amd and others added 10 commits May 22, 2026 12:45
…ly, set model_type in __init__

- model_type.h: Merge duplicate copyright lines into 2025-2026 range

- model_type.h: Rewrite IsQwenVLFamily to use std::array + std::find consistent with other methods

- qwen.py: Set model_type in __init__ for both Qwen35TextModel and Qwen35MoeTextModel instead of hardcoding in make_genai_config. Removes the make_genai_config override entirely.

Co-authored-by: Cursor <cursoragent@cursor.com>
…eric int4 config

- base.py: Add make_fused_moe() supporting router with/without bias, 2-weight SwiGLU layout with interleaving, and optional shared expert. Add make_shared_expert() using wrapper methods (make_sigmoid, make_mul, etc.). Move MoE /mlp/ int4 config cleanup into make_int4_algo_config().

- qwen.py: Remove _make_moe_fused (~150 lines) and make_moe dispatcher. Replace with single make_fused_moe() call from base class. Remove int4 algo cleanup from __init__ (now in base).

Co-authored-by: Cursor <cursoragent@cursor.com>
Per reviewer feedback, MoE builders in this codebase follow a model-specific pattern rather than a shared base class method. Moved make_moe, make_shared_expert, and int4 config cleanup back to Qwen35MoeTextModel. Retained use of wrapper methods (make_sigmoid, make_mul, make_add) instead of raw make_node/make_value.

Co-authored-by: Cursor <cursoragent@cursor.com>
auto-merge was automatically disabled May 22, 2026 19:46

Head branch was pushed to by a user without write access

@tanzeel-amd tanzeel-amd force-pushed the turrahma/qwen3.5-moe-support branch from 1e78431 to 51155a0 Compare May 22, 2026 19:46
@baijumeswani baijumeswani enabled auto-merge (squash) May 22, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants