Optimize FP8 gemm on XPU by xiangyuT · Pull Request #2 · analytics-zoo/ComfyUI

xiangyuT · 2026-03-19T08:49:42Z

No description provided.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Split M dimension into chunks when oneDNN fails to create FP8 primitives for large M values (e.g. WAN 2.2 14B FFN layers with M=32760). Benchmarked chunk_m=512 yields 4-8% speedup over dequant+bf16 for FFN shapes. Add COMFY_XPU_FP8_OMNI_LOG env var with 3 levels: 0=off, 1=misses only (default), 2=verbose. Previously all logging was gated by a single bool.

xiangyuT and others added 3 commits March 19, 2026 16:36

feat: add xpu fp8 omni linear integration

0fde103

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

test: cover mixed precision fp8 xpu integration

fb13e12

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize FP8 gemm on XPU#2

Optimize FP8 gemm on XPU#2
xiangyuT wants to merge 3 commits into
devfrom
feature/omni-fp8-xpu-dev-pr

xiangyuT commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiangyuT commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant