Skip to content

[RyzenAI 1.7.1 / OGA 0.11.2] NaN logits at 3rd decode step — Qwen-2.5-1.5B-Instruct NPU 16K #368

@arthurmougin

Description

@arthurmougin

[RyzenAI 1.7.1 / OGA 0.11.2] NaN logits at 3rd decode step — Qwen-2.5-1.5B-Instruct NPU 16K

Target : github.com/amd/RyzenAI-SW / issues
Severity : High — model unusable for generation >2 tokens
Component : onnxruntime-genai VitisAI EP / ryzenai-dynamic-dispatch


Summary

When generating text with amd/Qwen-2.5_1.5B_Instruct_rai_1.7.1_npu_16K (OGA 0.11.2, NPU provider), all 151,936 logits become NaN at the 3rd decode step (after 2 correct tokens). All subsequent tokens are also NaN garbage. The first 2 generated tokens are always semantically correct. The model is completely unusable for any task requiring >2 output tokens.

Observed: "What is 2+2?" → "2+!!!!!!" (garbage from token 3)
Expected: "2+2=4" or "Two plus two equals four"


Environment

Field Value
Machine AMD Ryzen AI 9 HX 370
NPU AMD XDNA2 — PCI VEN_1022&DEV_17F0
OS Windows 11 Pro Build 26200
NPU Driver 32.0.203.329 (dated 04/12/2025)
XRT Status ✅ PASS (DPU_2_ELF loading verified)
Python 3.12.10
onnxruntime-genai-directml-ryzenai 0.11.2
onnxruntime-vitisai 1.23.3
onnxruntime_providers_ryzenai 0.11.1
ryzenai-dynamic-dispatch 1.7.1
AMD RyzenAI-SW release v1.7.1 (2026-03-27)
Model amd/Qwen-2.5_1.5B_Instruct_rai_1.7.1_npu_16K
Snapshot hash d83d847501eabe6301fcad8066363ffef775395f

Prerequisites confirmed working

✅ 1 — DPU_2_ELF loads correctly

The NPU kernel DPU_2_ELF loads without error. Previous investigation confirmed:

  • XRT PASS with ryzenai-dynamic-dispatch 1.7.1
  • No "failed to load DPU" or "incompatible ELF" errors in ryzenai-server logs
  • The model loads, initializes, and warms up normally

✅ 2 — Chat template applied (ChatML format)

A tokenizer_config.json overlay injects the correct ChatML template. Verified with tokenizer.apply_chat_template():

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What is 2+2?<|im_end|>
<|im_start|>assistant

✅ 3 — First 2 tokens semantically correct

Step 1 logits (after token 1 = "2") are fully finite with expected distribution:

Top logits at step 1:
  '+' (id=10):  26.000
  ' +' (id=489): 25.375
  ' plus' (id=5646): 23.625
  '2' (id=17): 22.750
  'plus' (id=3767): 20.875

Token 2 = "+" — semantically correct continuation of "2".


Bug — NaN logits at step 2

Step-by-step logit trace

import onnxruntime_genai as og
import numpy as np

model = og.Model(overlay_path)   # NPU provider (RyzenAI EP)
tokenizer = og.Tokenizer(model)
params = og.GeneratorParams(model)
params.max_length = 32

prompt = apply_chat_template("What is 2+2?")  # ChatML format
tokens = tokenizer.encode(prompt)
params.input_ids = tokens

gen = og.Generator(model, params)
gen.append_tokens(tokens)
Step generate_next_token() get_next_tokens()[0] Text get_logits() NaN count
1 17 "2" 0 / 151936 — FINITE
2 10 "+" 151936 / 151936 — ALL NaN
3 0 "!" 151936 / 151936 — ALL NaN
4 0 "!" 151936 / 151936 — ALL NaN
5 0 "!" 151936 / 151936 — ALL NaN
NaN onset: step 2 = ALL 151,936 logits are NaN.
Once NaN, all subsequent steps are also NaN (KV cache corrupted).
Token id=0 is selected by argmax(NaN) → consistently outputs "!" (greedy fallback).

Reproducibility

  • Occurs for all tested prompts (ChatML full, short "2+2?", tiny "OK")
  • Occurs regardless of prompt length (4–29 prefix tokens)
  • Occurs with or without system message
  • Greedy decoding (top_k=1): NaN always at step 2
  • Sampling (top_k=40, top_p=1.0, temp=0.7): NaN still at step 2 in >90% of runs; occasionally a lucky token path avoids it for 1–2 more steps (not reliable)

max_tokens sweep results

max_tokens Output Correct tokens
1 "2" 1/1 ✅
2 "2+" 2/2 ✅
3 "2+!" 2/3 ❌
8 "2+!!!!!!" 2/8 ❌
16 "2+!!!!!!!!!!!!!!" 2/16 ❌

Provider isolation

CPU provider

config = og.Config(overlay_path)
config.clear_providers()  # Use CPU only
model = og.Model(config)  # FAILS at load time

Error:

Load model from (...)/model.onnx failed:
Node () Op (If) [TypeInferenceError] Graph attribute inferencing failed:
Fatal error: custom op registration (VitisAI EP) required

The model.onnx contains AMD VitisAI-specific If node subgraphs that cannot be parsed by the standard ORT CPU or DirectML EP. The model is architecturally NPU-only — there is no CPU fallback available.

DirectML provider

Same TypeInferenceError — cannot load without VitisAI EP registered.

Conclusion: The NaN bug is VitisAI EP / RyzenAI-specific. It cannot be reproduced or diagnosed on CPU/DML.


Minimal reproduction script

#!/usr/bin/env python3
"""
Minimal reproduction — NaN logits at decode step 2
AMD Qwen-2.5-1.5B-Instruct RyzenAI 1.7.1 NPU 16K
Requires: onnxruntime-genai-directml-ryzenai 0.11.2 (AMD RyzenAI SDK)
"""
import onnxruntime_genai as og
import numpy as np
import json

OVERLAY_PATH = r"C:\Users\Arthur Mougin\AI-lab\overlays\qwen25-15b-rai171-npu16k-template-fixed"
CHAT_TEMPLATE = (
    "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n"
    "<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n"
)

def run_repro(prompt_text: str, max_new_tokens: int = 6):
    model = og.Model(OVERLAY_PATH)
    tokenizer = og.Tokenizer(model)
    
    prompt = CHAT_TEMPLATE.format(prompt=prompt_text)
    tokens = tokenizer.encode(prompt)
    
    params = og.GeneratorParams(model)
    params.max_length = len(tokens) + max_new_tokens
    params.input_ids = tokens
    
    gen = og.Generator(model, params)
    gen.append_tokens(tokens)
    
    results = []
    step = 0
    while not gen.is_done():
        gen.generate_next_token()
        next_tok = int(gen.get_next_tokens()[0])
        logits = gen.get_logits()
        arr = np.array(logits[0][0])
        nan_count = int(np.isnan(arr).sum())
        finite_count = int(np.isfinite(arr).sum())
        results.append({
            "step": step,
            "token_id": next_tok,
            "token_text": tokenizer.decode([next_tok]),
            "nan_count": nan_count,
            "finite_count": finite_count,
            "all_nan": nan_count == len(arr),
        })
        step += 1
    
    return {
        "prompt": prompt_text,
        "steps": results,
        "nan_first_step": next((r["step"] for r in results if r["all_nan"]), None),
        "output": "".join(r["token_text"] for r in results),
    }

if __name__ == "__main__":
    result = run_repro("What is 2+2?")
    print(json.dumps(result, indent=2))
    print(f"\nNaN first at step: {result['nan_first_step']}")
    print(f"Output: {result['output']!r}")

Expected output with fixed runtime:

{ "nan_first_step": null, "output": "2+2=4" }

Actual output:

{ "nan_first_step": 1, "output": "2+!!!!!!" }

(step 1 in 0-indexed = the 2nd call to generate_next_token; "NaN at decode step 3" in 1-indexed human terms)


Questions for AMD / maintainers

  1. Is this a known issue with OGA 0.11.2 + ryzenai-dynamic-dispatch 1.7.1 on XDNA2?

  2. Has this been fixed in an internal/development build? If so, is there a beta wheel or an ETA for public release?

  3. What is the expected mechanism? Is it an INT4 overflow in the NPU KV cache operations after step 2? A missing numerics stabilization (e.g., missing softmax temperature clamp)?

  4. Is there a workaround short of max_tokens≤2? For example: KV cache reset after every step? Explicit logit clamping? A different SearchOptions configuration?

  5. Is the Qwen-2.5-3B or Qwen-2.5-7B NPU model affected by the same bug? (We have not tested these, but if 1.5B is specifically affected, it could indicate a model-size-specific quantization issue.)

  6. Does this affect any other models at RyzenAI 1.7.1? (e.g., Phi-3.5, Llama-3.2)


Additional context

  • Model HF page: https://huggingface.co/amd/Qwen-2.5_1.5B_Instruct_rai_1.7.1_npu_16K
  • AMD RyzenAI-SW: https://github.com/amd/RyzenAI-SW
  • AMD RyzenAI release v1.7.1: https://github.com/amd/RyzenAI-SW/releases/tag/v1.7.1
  • OGA 0.11.2 wheel source: AMD internal distribution (not on public PyPI)
  • The onnxruntime-genai-directml-ryzenai 0.7.0.1 on PyPI is an older version (likely 1.5.x era)

Reported by: Windows 11 Pro, DESKTOP-4F35QQ5, RyzenAI 9 HX 370 / XDNA2 | 2026-04-28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions