Add Qwen3.5-MoE (35B-A3B) model support#2146
Conversation
There was a problem hiding this comment.
Pull request overview
Adds builder + runtime plumbing for Qwen3.5-MoE export/inference, integrating a new MoE-capable Qwen3.5 builder into the Python model builder and registering the corresponding model type in the C++ runtime so the correct multi-modal processor/model-family behaviors can be selected at runtime.
Changes:
- Introduces
Qwen35MoeTextModelin the Python model builder with fused MoE/QMoE graph construction (router + routed experts + shared expert). - Adds builder dispatch/import wiring so
Qwen3_5MoeForConditionalGenerationcan be exported viabuilder.py. - Registers
qwen3_5_moein the C++ runtime as a VLM/Qwen-VL-family type and wires it to the existing Qwen image processor.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/python/py/models/builders/qwen.py | Adds Qwen35MoeTextModel and fused MoE/QMoE subgraph generation for Qwen3.5-MoE. |
| src/python/py/models/builders/init.py | Exports Qwen35MoeTextModel from the builders package. |
| src/python/py/models/builder.py | Adds architecture dispatch for Qwen3_5MoeForConditionalGeneration. |
| src/models/model.cpp | Registers qwen3_5_moe with MultiModalProcessor’s processor factory. |
| src/models/model_type.h | Adds qwen3_5_moe to VLM and Qwen-VL-family classification helpers. |
|
@microsoft-github-policy-service agree company="AMD" |
|
@kunal-vaishnavi pls review this. Olive recipes are added here: microsoft/olive-recipes#405 |
|
@tanzeel-amd could you please address copilot review comments if relevant? |
9d80321 to
0be688b
Compare
|
@baijumeswani resolved the copilot comments. Please review. |
|
|
@tanzeel-amd could you please resolve the merge conflict? |
461dc38 to
1e78431
Compare
|
Resolved conflicts @baijumeswani |
- Added Qwen35MoeTextModel builder class for Qwen3_5MoeForConditionalGeneration architecture with 256 experts, shared expert, and SwiGLU activation - Registered qwen3_5_moe as VLM type in C++ runtime (model_type.h, model.cpp) - Added architecture dispatch in builder.py for Qwen3_5MoeForConditionalGeneration - Key implementation details: - Repacks HF concatenated gate_up_proj to ORT interleaved format (swiglu_fusion=1) - Shared expert implemented as separate SiLU MLP path with sigmoid gating - Router uses bias-free MatMul matching Qwen3_5MoeTopKRouter - QMoE symmetric blockwise quantization without explicit zero_points - Also includes existing gemma.py rope_local_base_freq fix for TranslateGemma
…ly, set model_type in __init__ - model_type.h: Merge duplicate copyright lines into 2025-2026 range - model_type.h: Rewrite IsQwenVLFamily to use std::array + std::find consistent with other methods - qwen.py: Set model_type in __init__ for both Qwen35TextModel and Qwen35MoeTextModel instead of hardcoding in make_genai_config. Removes the make_genai_config override entirely. Co-authored-by: Cursor <cursoragent@cursor.com>
…eric int4 config - base.py: Add make_fused_moe() supporting router with/without bias, 2-weight SwiGLU layout with interleaving, and optional shared expert. Add make_shared_expert() using wrapper methods (make_sigmoid, make_mul, etc.). Move MoE /mlp/ int4 config cleanup into make_int4_algo_config(). - qwen.py: Remove _make_moe_fused (~150 lines) and make_moe dispatcher. Replace with single make_fused_moe() call from base class. Remove int4 algo cleanup from __init__ (now in base). Co-authored-by: Cursor <cursoragent@cursor.com>
Per reviewer feedback, MoE builders in this codebase follow a model-specific pattern rather than a shared base class method. Moved make_moe, make_shared_expert, and int4 config cleanup back to Qwen35MoeTextModel. Retained use of wrapper methods (make_sigmoid, make_mul, make_add) instead of raw make_node/make_value. Co-authored-by: Cursor <cursoragent@cursor.com>
Head branch was pushed to by a user without write access
1e78431 to
51155a0
Compare
Model builder and runtime support for Qwen3.5-MoE