Skip to content

[Ryzen AI 1.7.1] Phi-4-mini-instruct-onnx-ryzenai-npu fails with "flat version is not supported for matmulbias #371

@MarkovChen1996

Description

@MarkovChen1996

Issue Description

Summary

The official Phi-4-mini-instruct-onnx-ryzenai-npu model from HuggingFace fails to run on Ryzen AI 1.7.1 with the error: "flat version is not supported for matmulbias". This contradicts the Release Notes 1.7 which lists Phi-4-mini-instruct as a supported model.

Environment

  • Hardware: AMD Ryzen AI processor (PN54)
    • NPU Device: PCI\VEN_1022&DEV_17F0&REV_20 (Krackan - KRK)
    • NPU Status: OK
  • NPU Driver: 32.0.203.280 (as required)
  • Ryzen AI Software: 1.7.1 (installed via official MSI)
  • ONNX Runtime GenAI: 0.11.2
  • ONNX Runtime: 1.23.3.dev20260320
  • Operating System: Windows 11
  • Conda Environment: ryzen-ai-1.7.1 (created by installer)

Model Information

Steps to Reproduce

  1. Install Ryzen AI Software 1.7.1 with NPU driver 32.0.203.280
  2. Activate conda environment: conda activate ryzen-ai-1.7.1
  3. Download model:
    hf download amd/Phi-4-mini-instruct-onnx-ryzenai-npu --local-dir ./Phi-4-mini-instruct-onnx-ryzenai-npu
  4. Run inference:
    python "%RYZEN_AI_INSTALLATION_PATH%\LLM\example\model_chat.py" -m ./Phi-4-mini-instruct-onnx-ryzenai-npu -pr prompt.txt

Error Output

DEPRECATED session option was used (config_entries): use 'session_options' directly instead.
Failed to initialize fusion runtime for node 'MatMulNBits_2_0': [C:\Users\z1aiebuild\dod\src\ops\op_builder.cpp:114] [ERROR] OpBuilder::create() failed.
Details:
  OpName: MatMulNBits_2_0
  OpType: MladfMatMul
  Provided arg dtypes: 0:bfloat16 1:int8 2:float 3:float 4:int8 5:bfloat16 
  Error: [C:\Users\z1aiebuild\dod\src\ops\llm_ops\llm_common\llm_common.cpp:162] [ERROR] flat version is not supported for matmulbias

RuntimeError: Exception during initialization: [C:\Users\z1aiebuild\dod\src\ops\op_builder.cpp:114] [ERROR] OpBuilder::create() failed.

Root Cause Analysis

  1. Model uses "flat version" compilation format

    • The model's .cache/MatMulNBits_2_0_meta.json contains "op_version": "flat"
    • It includes operators: FLATMHA, FlatRMSAdd, FlatMLP
  2. Runtime library does not support flat version

    • Error originates from dyn_bins.dll (llm_common.cpp:162)
    • This is a hard-coded limitation in the current release
  3. Configuration mismatch

    • Model's genai_config.json uses deprecated config_entries format
    • Even after converting to provider_options format, the same error persists

Attempted Solutions (All Failed)

✅ Verified environment setup is correct (PATH, DLLs, conda environment)
✅ Updated genai_config.json from config_entries to provider_options format
✅ Checked model file integrity (all files downloaded correctly)
✅ Tested various environment variables (no effect)
Cannot enable flat version support - appears to be runtime library limitation

Comparison with Working Model

SmolLM2-135M-Instruct_rai_1.7.1_npu_4K (✅ Works):

  • Uses provider_options format
  • Uses full.onnx (not fusion.onnx)
  • Uses dd_metastate_Llm_Token_Token_rms_norm_* files
  • No .cache/ directory with flat version metadata

Phi-4-mini-instruct-onnx-ryzenai-npu (❌ Fails):

  • Uses deprecated config_entries format
  • Uses fusion.onnx
  • Uses dd_metastate_Llm_Token_MatMulNBits_2_0.* files
  • Contains .cache/MatMulNBits_2_0_meta.json with "op_version": "flat"

Questions for AMD

  1. Is flat version support planned for Ryzen AI 1.7.1?

    • If yes, what is the required NPU firmware/driver version?
    • If no, why is Phi-4-mini listed in Release Notes 1.7 as a supported model?
  2. Is there an alternative Phi-4-mini model without flat version?

    • We noticed Phi-4-mini-instruct_rai_1.7.1_npu_4K and Phi-4-mini-instruct_rai_1.7.1_npu_16K collections exist
    • Are these non-flat-version builds?
  3. Should this model be moved to a different collection?

    • The model appears to belong to Collection V2 (based on compilation format)
    • But it's listed in Collection V1
  4. Documentation clarity

    • No official documentation mentions flat version requirements
    • Can you add firmware/runtime version requirements to model cards?

Expected Behavior

The model should run successfully as documented in Release Notes 1.7, or the documentation should clearly state the limitations and requirements.

Actual Behavior

Model fails with "flat version is not supported" error, despite being listed as a supported model.

Additional Context

  • Other Collection V2 models (Llama-3.2, Qwen2.5) show the same error
  • This suggests a systematic compatibility issue between model compilation format and runtime library
  • Users have no way to know which models are compatible before downloading (multi-GB downloads)

Suggested Fix

  1. Update runtime libraries to support flat version format, OR
  2. Provide non-flat-version Phi-4-mini model builds, OR
  3. Clearly document in model cards which runtime versions are required, OR
  4. Move incompatible models to a separate collection with clear warnings

Test Environment Verification

# NPU Device
Get-PnpDevice -FriendlyName "*NPU*"
# Output: NPU Compute Accelerator Device - OK

# NPU Hardware ID
pnputil /enum-devices /bus PCI /deviceids | Select-String "17F0"
# Output: PCI\VEN_1022&DEV_17F0&SUBSYS_17F01022&REV_20

# ONNX Runtime GenAI Version
python -c "import onnxruntime_genai as og; print(og.__version__)"
# Output: 0.11.2

# ONNX Runtime Version
python -c "import onnxruntime as ort; print(ort.__version__); print(ort.get_available_providers())"
# Output: 1.23.3.dev20260320
# Providers: ['VitisAIExecutionProvider', 'DmlExecutionProvider', 'CPUExecutionProvider']

Impact

This issue affects users who:

  • Follow official AMD documentation and Release Notes
  • Download models from official AMD HuggingFace collections
  • Expect advertised models to work with the recommended software version

Severity: High - Core advertised functionality does not work as documented

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions