Add Qwen3.5-35B-A3B MoE VLM ONNX export recipe by tanzeel-amd · Pull Request #405 · microsoft/olive-recipes

tanzeel-amd · 2026-05-08T10:34:50Z

Olive recipe for exporting Qwen/Qwen3.5-35B-A3B (256 experts, 8 routed + 1 shared)
Three sub-model pipeline: text decoder (INT4 QMoE), embedding (FP32), vision (FP32)
Custom ONNX-export-friendly MoE model class (codes/modeling_qwen3_5_moe.py)
Inference script with text, image, interactive, and benchmark modes
Requires ORT GenAI built with qwen3_5_moe support

- Olive recipe for exporting Qwen/Qwen3.5-35B-A3B (256 experts, 8 routed + 1 shared) - Three sub-model pipeline: text decoder (INT4 QMoE), embedding (FP32), vision (FP32) - Custom ONNX-export-friendly MoE model class (codes/modeling_qwen3_5_moe.py) - Inference script with text, image, interactive, and benchmark modes - Requires ORT GenAI built with qwen3_5_moe support (see DEBUG_STATUS.md)

Copilot

Pull request overview

Adds a new Olive recipe to export and run Qwen/Qwen3.5-35B-A3B as a three-submodel ONNX Runtime GenAI pipeline (vision encoder + embedding fusion + INT4 text decoder), including a custom ONNX-export-friendly MoE model shell and an inference/benchmark script.

Changes:

Introduces a custom Qwen3_5MoeModel implementation used for ONNX export of the vision and embedding submodels.
Adds Olive JSON pipelines for exporting/optimizing vision.onnx, embedding.onnx, and building text.onnx via ModelBuilder (INT4).
Adds end-to-end optimize.py config generation and inference.py runner with interactive + benchmark + optional PyTorch comparison.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`Qwen-Qwen3.5-35B-A3B/LICENSE`	Adds upstream Apache-2.0 license text for the recipe content.
`Qwen-Qwen3.5-35B-A3B/builtin/user_script.py`	Provides Olive model loaders + dummy inputs for exporting embedding/vision via a custom model shell.
`Qwen-Qwen3.5-35B-A3B/builtin/optimize.py`	Orchestrates Olive runs and patches `genai_config.json` + writes `processor_config.json` + tokenizer fixups.
`Qwen-Qwen3.5-35B-A3B/builtin/inference.py`	Adds ORT GenAI inference script with interactive mode and benchmarking (optionally vs PyTorch).
`Qwen-Qwen3.5-35B-A3B/builtin/cpu_and_mobile/text.json`	Olive pipeline to build INT4 text decoder via ModelBuilder.
`Qwen-Qwen3.5-35B-A3B/builtin/cpu_and_mobile/embedding.json`	Olive pipeline to export embedding fusion model and apply graph surgeries/optimizations.
`Qwen-Qwen3.5-35B-A3B/builtin/cpu_and_mobile/vision.json`	Olive pipeline to export vision encoder, apply PackedAttention surgery, and optimization passes.
`Qwen-Qwen3.5-35B-A3B/builtin/codes/modeling_qwen3_5_moe.py`	Custom ONNX-export-friendly model implementation (vision + embedding shell + MoE text components).
`Qwen-Qwen3.5-35B-A3B/builtin/codes/__init__.py`	Initializes the `codes` module for imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    from safetensors.torch import load_file
+    from huggingface_hub import hf_hub_download
+    import glob
+
+    cfg_path = hf_hub_download(model_path, "config.json")
+    model_dir = os.path.dirname(cfg_path)
+    st_files = sorted(glob.glob(os.path.join(model_dir, "*.safetensors")))
+
+    state_dict = {}
+    for sf in st_files:
+        tensors = load_file(sf)
+        for k, v in tensors.items():
+            if k.startswith("model."):
+                stripped = k[6:]
+                state_dict[stripped] = v
+                if stripped.startswith("language_model.embed_tokens."):
+                    state_dict[stripped[len("language_model."):]] = v


+"""End-to-end optimization pipeline for Qwen3.5-35B-A3B MoE VLM.
+
+Exports three sub-models (vision encoder, text embedding, text decoder),
+applies graph optimizations and INT4 quantization via Olive passes.
+
+Usage:
+    python optimize.py --config-dir cpu_and_mobile --device cpu
+    python optimize.py --config-dir cpu_and_mobile --device cpu --skip-export
+"""


+# Copyright (C) 2026 Advanced Micro Devices, Inc. All rights reserved.
+# Portions of this file consist of AI generated content.
+# --------------------------------------------------------------------------
+# SPDX-License-Identifier: MIT


tanzeel-amd · 2026-05-08T10:43:52Z

@microsoft-github-policy-service agree company="AMD"

VishalX · 2026-05-11T10:06:50Z

@xieofxie / @devang-ml pls review

xieofxie · 2026-05-12T01:43:32Z

please wait for microsoft/onnxruntime-genai#2146

Ur Rahman and others added 7 commits April 14, 2026 03:25

Add LICENSE

4b81437

Update Licence

7a7d1df

Update Licence

c27098d

Update Licence

d46c11b

Update Licence

5990df9

Add MIT license for new files

ff4d7d3

Copilot AI review requested due to automatic review settings May 8, 2026 10:34

Copilot started reviewing on behalf of tanzeel-amd May 8, 2026 10:35 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

VishalX mentioned this pull request May 8, 2026

Add Qwen3.5-MoE (35B-A3B) model support microsoft/onnxruntime-genai#2146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5-35B-A3B MoE VLM ONNX export recipe#405

Add Qwen3.5-35B-A3B MoE VLM ONNX export recipe#405
tanzeel-amd wants to merge 7 commits into
microsoft:mainfrom
tanzeel-amd:turrahma/qwen3.5-moe-35B-A3B

tanzeel-amd commented May 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

tanzeel-amd commented May 8, 2026

Uh oh!

VishalX commented May 11, 2026

Uh oh!

xieofxie commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tanzeel-amd commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

tanzeel-amd commented May 8, 2026

Uh oh!

VishalX commented May 11, 2026

Uh oh!

xieofxie commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tanzeel-amd commented May 8, 2026 •

edited

Loading