Skip to content

Fix accuracy issue and enable Arctic and Grok for arch tests#183

Merged
zhaixuejun1993 merged 3 commits into
ravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/arch-test-gpt-oss
May 25, 2026
Merged

Fix accuracy issue and enable Arctic and Grok for arch tests#183
zhaixuejun1993 merged 3 commits into
ravi9:dev_backend_openvinofrom
zhaixuejun1993:xuejun/arch-test-gpt-oss

Conversation

@zhaixuejun1993
Copy link
Copy Markdown
Collaborator

This pull request updates the OpenVINO backend's operator routing logic to improve numerical stability for Mixture-of-Experts (MoE) and related operations, especially for arctic-style and qwen3next models. The changes ensure certain numerically sensitive operations are kept on the CPU instead of being offloaded to the GPU, which helps maintain accuracy and parity with reference implementations.

MoE and Routing Operations:

  • Force MoE routing weights gather (ffn_moe_weights), normalization (ffn_moe_weights_norm), probabilities (ffn_moe_probs), and related reshape/add/sum/clamp ops to run on the CPU, regardless of device, to avoid numerical instability on GPU, especially for arctic-style and qwen3next models. [1] [2] [3] [4] [5]
  • Add logic to keep MoE softmax and its related computation path on CPU to restore numerical parity.

General Operator Routing and Stability:

  • Always keep SSM_CONV and certain MUL_MAT patterns on CPU due to known numerical issues in OpenVINO's GPU path. [1] [2]
  • Add comments and guards for future fixes and parity restoration, e.g., for FLASH_ATTN_EXT and other ops. [1] [2]

These changes collectively improve model accuracy and reliability by ensuring sensitive operations are executed on the CPU where necessary.## Overview

Additional information

Requirements

@zhaixuejun1993 zhaixuejun1993 merged commit 48ef5fe into ravi9:dev_backend_openvino May 25, 2026
3 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants