[Ryzen AI 1.7.1] Phi-4-mini-instruct-onnx-ryzenai-npu fails with "flat version is not supported for matmulbias

## Issue Description

### Summary
The official Phi-4-mini-instruct-onnx-ryzenai-npu model from HuggingFace fails to run on Ryzen AI 1.7.1 with the error: **"flat version is not supported for matmulbias"**. This contradicts the Release Notes 1.7 which lists Phi-4-mini-instruct as a supported model.

### Environment
- **Hardware**: AMD Ryzen AI processor (PN54)
  - NPU Device: `PCI\VEN_1022&DEV_17F0&REV_20` (Krackan - KRK)
  - NPU Status: OK
- **NPU Driver**: 32.0.203.280 (as required)
- **Ryzen AI Software**: 1.7.1 (installed via official MSI)
- **ONNX Runtime GenAI**: 0.11.2
- **ONNX Runtime**: 1.23.3.dev20260320
- **Operating System**: Windows 11
- **Conda Environment**: ryzen-ai-1.7.1 (created by installer)

### Model Information
- **Model**: amd/Phi-4-mini-instruct-onnx-ryzenai-npu
- **Source**: https://huggingface.co/amd/Phi-4-mini-instruct-onnx-ryzenai-npu
- **Collection**: Listed in [Ryzen-AI-1.7-NPU-LLM](https://huggingface.co/collections/amd/ryzen-ai-17-npu-llm-65e624c35fab664a1e53b48c)
- **Upload Date**: January 21, 2026
- **Model Files**: 
  - `fusion.onnx` + `fusion.onnx.data`
  - `prefill.pb.bin`
  - `.cache/MatMulNBits_2_0_meta.json` (contains `"op_version": "flat"`)

### Steps to Reproduce

1. Install Ryzen AI Software 1.7.1 with NPU driver 32.0.203.280
2. Activate conda environment: `conda activate ryzen-ai-1.7.1`
3. Download model:
   ```powershell
   hf download amd/Phi-4-mini-instruct-onnx-ryzenai-npu --local-dir ./Phi-4-mini-instruct-onnx-ryzenai-npu
   ```
4. Run inference:
   ```powershell
   python "%RYZEN_AI_INSTALLATION_PATH%\LLM\example\model_chat.py" -m ./Phi-4-mini-instruct-onnx-ryzenai-npu -pr prompt.txt
   ```

### Error Output

```
DEPRECATED session option was used (config_entries): use 'session_options' directly instead.
Failed to initialize fusion runtime for node 'MatMulNBits_2_0': [C:\Users\z1aiebuild\dod\src\ops\op_builder.cpp:114] [ERROR] OpBuilder::create() failed.
Details:
  OpName: MatMulNBits_2_0
  OpType: MladfMatMul
  Provided arg dtypes: 0:bfloat16 1:int8 2:float 3:float 4:int8 5:bfloat16 
  Error: [C:\Users\z1aiebuild\dod\src\ops\llm_ops\llm_common\llm_common.cpp:162] [ERROR] flat version is not supported for matmulbias

RuntimeError: Exception during initialization: [C:\Users\z1aiebuild\dod\src\ops\op_builder.cpp:114] [ERROR] OpBuilder::create() failed.
```

### Root Cause Analysis

1. **Model uses "flat version" compilation format**
   - The model's `.cache/MatMulNBits_2_0_meta.json` contains `"op_version": "flat"`
   - It includes operators: `FLATMHA`, `FlatRMSAdd`, `FlatMLP`

2. **Runtime library does not support flat version**
   - Error originates from `dyn_bins.dll` (`llm_common.cpp:162`)
   - This is a **hard-coded limitation** in the current release

3. **Configuration mismatch**
   - Model's `genai_config.json` uses deprecated `config_entries` format
   - Even after converting to `provider_options` format, the same error persists

### Attempted Solutions (All Failed)

✅ Verified environment setup is correct (PATH, DLLs, conda environment)
✅ Updated `genai_config.json` from `config_entries` to `provider_options` format
✅ Checked model file integrity (all files downloaded correctly)
✅ Tested various environment variables (no effect)
❌ **Cannot enable flat version support** - appears to be runtime library limitation

### Comparison with Working Model

**SmolLM2-135M-Instruct_rai_1.7.1_npu_4K** (✅ Works):
- Uses `provider_options` format
- Uses `full.onnx` (not `fusion.onnx`)
- Uses `dd_metastate_Llm_Token_Token_rms_norm_*` files
- **No** `.cache/` directory with flat version metadata

**Phi-4-mini-instruct-onnx-ryzenai-npu** (❌ Fails):
- Uses deprecated `config_entries` format
- Uses `fusion.onnx`
- Uses `dd_metastate_Llm_Token_MatMulNBits_2_0.*` files
- **Contains** `.cache/MatMulNBits_2_0_meta.json` with `"op_version": "flat"`

### Questions for AMD

1. **Is flat version support planned for Ryzen AI 1.7.1?**
   - If yes, what is the required NPU firmware/driver version?
   - If no, why is Phi-4-mini listed in Release Notes 1.7 as a supported model?

2. **Is there an alternative Phi-4-mini model without flat version?**
   - We noticed `Phi-4-mini-instruct_rai_1.7.1_npu_4K` and `Phi-4-mini-instruct_rai_1.7.1_npu_16K` collections exist
   - Are these non-flat-version builds?

3. **Should this model be moved to a different collection?**
   - The model appears to belong to Collection V2 (based on compilation format)
   - But it's listed in Collection V1

4. **Documentation clarity**
   - No official documentation mentions flat version requirements
   - Can you add firmware/runtime version requirements to model cards?

### Expected Behavior
The model should run successfully as documented in Release Notes 1.7, or the documentation should clearly state the limitations and requirements.

### Actual Behavior
Model fails with "flat version is not supported" error, despite being listed as a supported model.

### Additional Context
- Other Collection V2 models (Llama-3.2, Qwen2.5) show the same error
- This suggests a systematic compatibility issue between model compilation format and runtime library
- Users have no way to know which models are compatible before downloading (multi-GB downloads)

### Suggested Fix
1. Update runtime libraries to support flat version format, OR
2. Provide non-flat-version Phi-4-mini model builds, OR
3. Clearly document in model cards which runtime versions are required, OR
4. Move incompatible models to a separate collection with clear warnings

---

## Test Environment Verification

```powershell
# NPU Device
Get-PnpDevice -FriendlyName "*NPU*"
# Output: NPU Compute Accelerator Device - OK

# NPU Hardware ID
pnputil /enum-devices /bus PCI /deviceids | Select-String "17F0"
# Output: PCI\VEN_1022&DEV_17F0&SUBSYS_17F01022&REV_20

# ONNX Runtime GenAI Version
python -c "import onnxruntime_genai as og; print(og.__version__)"
# Output: 0.11.2

# ONNX Runtime Version
python -c "import onnxruntime as ort; print(ort.__version__); print(ort.get_available_providers())"
# Output: 1.23.3.dev20260320
# Providers: ['VitisAIExecutionProvider', 'DmlExecutionProvider', 'CPUExecutionProvider']
```

---

## Impact
This issue affects users who:
- Follow official AMD documentation and Release Notes
- Download models from official AMD HuggingFace collections
- Expect advertised models to work with the recommended software version

**Severity**: High - Core advertised functionality does not work as documented


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ryzen AI 1.7.1] Phi-4-mini-instruct-onnx-ryzenai-npu fails with "flat version is not supported for matmulbias #371

Issue Description

Summary

Environment

Model Information

Steps to Reproduce

Error Output

Root Cause Analysis

Attempted Solutions (All Failed)

Comparison with Working Model

Questions for AMD

Expected Behavior

Actual Behavior

Additional Context

Suggested Fix

Test Environment Verification

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Ryzen AI 1.7.1] Phi-4-mini-instruct-onnx-ryzenai-npu fails with "flat version is not supported for matmulbias #371

Description

Issue Description

Summary

Environment

Model Information

Steps to Reproduce

Error Output

Root Cause Analysis

Attempted Solutions (All Failed)

Comparison with Working Model

Questions for AMD

Expected Behavior

Actual Behavior

Additional Context

Suggested Fix

Test Environment Verification

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions