Issue Description
Summary
The official Phi-4-mini-instruct-onnx-ryzenai-npu model from HuggingFace fails to run on Ryzen AI 1.7.1 with the error: "flat version is not supported for matmulbias". This contradicts the Release Notes 1.7 which lists Phi-4-mini-instruct as a supported model.
Environment
- Hardware: AMD Ryzen AI processor (PN54)
- NPU Device:
PCI\VEN_1022&DEV_17F0&REV_20 (Krackan - KRK)
- NPU Status: OK
- NPU Driver: 32.0.203.280 (as required)
- Ryzen AI Software: 1.7.1 (installed via official MSI)
- ONNX Runtime GenAI: 0.11.2
- ONNX Runtime: 1.23.3.dev20260320
- Operating System: Windows 11
- Conda Environment: ryzen-ai-1.7.1 (created by installer)
Model Information
Steps to Reproduce
- Install Ryzen AI Software 1.7.1 with NPU driver 32.0.203.280
- Activate conda environment:
conda activate ryzen-ai-1.7.1
- Download model:
hf download amd/Phi-4-mini-instruct-onnx-ryzenai-npu --local-dir ./Phi-4-mini-instruct-onnx-ryzenai-npu
- Run inference:
python "%RYZEN_AI_INSTALLATION_PATH%\LLM\example\model_chat.py" -m ./Phi-4-mini-instruct-onnx-ryzenai-npu -pr prompt.txt
Error Output
DEPRECATED session option was used (config_entries): use 'session_options' directly instead.
Failed to initialize fusion runtime for node 'MatMulNBits_2_0': [C:\Users\z1aiebuild\dod\src\ops\op_builder.cpp:114] [ERROR] OpBuilder::create() failed.
Details:
OpName: MatMulNBits_2_0
OpType: MladfMatMul
Provided arg dtypes: 0:bfloat16 1:int8 2:float 3:float 4:int8 5:bfloat16
Error: [C:\Users\z1aiebuild\dod\src\ops\llm_ops\llm_common\llm_common.cpp:162] [ERROR] flat version is not supported for matmulbias
RuntimeError: Exception during initialization: [C:\Users\z1aiebuild\dod\src\ops\op_builder.cpp:114] [ERROR] OpBuilder::create() failed.
Root Cause Analysis
-
Model uses "flat version" compilation format
- The model's
.cache/MatMulNBits_2_0_meta.json contains "op_version": "flat"
- It includes operators:
FLATMHA, FlatRMSAdd, FlatMLP
-
Runtime library does not support flat version
- Error originates from
dyn_bins.dll (llm_common.cpp:162)
- This is a hard-coded limitation in the current release
-
Configuration mismatch
- Model's
genai_config.json uses deprecated config_entries format
- Even after converting to
provider_options format, the same error persists
Attempted Solutions (All Failed)
✅ Verified environment setup is correct (PATH, DLLs, conda environment)
✅ Updated genai_config.json from config_entries to provider_options format
✅ Checked model file integrity (all files downloaded correctly)
✅ Tested various environment variables (no effect)
❌ Cannot enable flat version support - appears to be runtime library limitation
Comparison with Working Model
SmolLM2-135M-Instruct_rai_1.7.1_npu_4K (✅ Works):
- Uses
provider_options format
- Uses
full.onnx (not fusion.onnx)
- Uses
dd_metastate_Llm_Token_Token_rms_norm_* files
- No
.cache/ directory with flat version metadata
Phi-4-mini-instruct-onnx-ryzenai-npu (❌ Fails):
- Uses deprecated
config_entries format
- Uses
fusion.onnx
- Uses
dd_metastate_Llm_Token_MatMulNBits_2_0.* files
- Contains
.cache/MatMulNBits_2_0_meta.json with "op_version": "flat"
Questions for AMD
-
Is flat version support planned for Ryzen AI 1.7.1?
- If yes, what is the required NPU firmware/driver version?
- If no, why is Phi-4-mini listed in Release Notes 1.7 as a supported model?
-
Is there an alternative Phi-4-mini model without flat version?
- We noticed
Phi-4-mini-instruct_rai_1.7.1_npu_4K and Phi-4-mini-instruct_rai_1.7.1_npu_16K collections exist
- Are these non-flat-version builds?
-
Should this model be moved to a different collection?
- The model appears to belong to Collection V2 (based on compilation format)
- But it's listed in Collection V1
-
Documentation clarity
- No official documentation mentions flat version requirements
- Can you add firmware/runtime version requirements to model cards?
Expected Behavior
The model should run successfully as documented in Release Notes 1.7, or the documentation should clearly state the limitations and requirements.
Actual Behavior
Model fails with "flat version is not supported" error, despite being listed as a supported model.
Additional Context
- Other Collection V2 models (Llama-3.2, Qwen2.5) show the same error
- This suggests a systematic compatibility issue between model compilation format and runtime library
- Users have no way to know which models are compatible before downloading (multi-GB downloads)
Suggested Fix
- Update runtime libraries to support flat version format, OR
- Provide non-flat-version Phi-4-mini model builds, OR
- Clearly document in model cards which runtime versions are required, OR
- Move incompatible models to a separate collection with clear warnings
Test Environment Verification
# NPU Device
Get-PnpDevice -FriendlyName "*NPU*"
# Output: NPU Compute Accelerator Device - OK
# NPU Hardware ID
pnputil /enum-devices /bus PCI /deviceids | Select-String "17F0"
# Output: PCI\VEN_1022&DEV_17F0&SUBSYS_17F01022&REV_20
# ONNX Runtime GenAI Version
python -c "import onnxruntime_genai as og; print(og.__version__)"
# Output: 0.11.2
# ONNX Runtime Version
python -c "import onnxruntime as ort; print(ort.__version__); print(ort.get_available_providers())"
# Output: 1.23.3.dev20260320
# Providers: ['VitisAIExecutionProvider', 'DmlExecutionProvider', 'CPUExecutionProvider']
Impact
This issue affects users who:
- Follow official AMD documentation and Release Notes
- Download models from official AMD HuggingFace collections
- Expect advertised models to work with the recommended software version
Severity: High - Core advertised functionality does not work as documented
Issue Description
Summary
The official Phi-4-mini-instruct-onnx-ryzenai-npu model from HuggingFace fails to run on Ryzen AI 1.7.1 with the error: "flat version is not supported for matmulbias". This contradicts the Release Notes 1.7 which lists Phi-4-mini-instruct as a supported model.
Environment
PCI\VEN_1022&DEV_17F0&REV_20(Krackan - KRK)Model Information
fusion.onnx+fusion.onnx.dataprefill.pb.bin.cache/MatMulNBits_2_0_meta.json(contains"op_version": "flat")Steps to Reproduce
conda activate ryzen-ai-1.7.1Error Output
Root Cause Analysis
Model uses "flat version" compilation format
.cache/MatMulNBits_2_0_meta.jsoncontains"op_version": "flat"FLATMHA,FlatRMSAdd,FlatMLPRuntime library does not support flat version
dyn_bins.dll(llm_common.cpp:162)Configuration mismatch
genai_config.jsonuses deprecatedconfig_entriesformatprovider_optionsformat, the same error persistsAttempted Solutions (All Failed)
✅ Verified environment setup is correct (PATH, DLLs, conda environment)
✅ Updated
genai_config.jsonfromconfig_entriestoprovider_optionsformat✅ Checked model file integrity (all files downloaded correctly)
✅ Tested various environment variables (no effect)
❌ Cannot enable flat version support - appears to be runtime library limitation
Comparison with Working Model
SmolLM2-135M-Instruct_rai_1.7.1_npu_4K (✅ Works):
provider_optionsformatfull.onnx(notfusion.onnx)dd_metastate_Llm_Token_Token_rms_norm_*files.cache/directory with flat version metadataPhi-4-mini-instruct-onnx-ryzenai-npu (❌ Fails):
config_entriesformatfusion.onnxdd_metastate_Llm_Token_MatMulNBits_2_0.*files.cache/MatMulNBits_2_0_meta.jsonwith"op_version": "flat"Questions for AMD
Is flat version support planned for Ryzen AI 1.7.1?
Is there an alternative Phi-4-mini model without flat version?
Phi-4-mini-instruct_rai_1.7.1_npu_4KandPhi-4-mini-instruct_rai_1.7.1_npu_16Kcollections existShould this model be moved to a different collection?
Documentation clarity
Expected Behavior
The model should run successfully as documented in Release Notes 1.7, or the documentation should clearly state the limitations and requirements.
Actual Behavior
Model fails with "flat version is not supported" error, despite being listed as a supported model.
Additional Context
Suggested Fix
Test Environment Verification
Impact
This issue affects users who:
Severity: High - Core advertised functionality does not work as documented