Add initial NVIDIA GPU backend bring-up by zhoubot · Pull Request #32 · hw-native-sys/pto-isa

zhoubot · 2026-03-31T12:58:53Z

Summary

This PR brings up an initial NVIDIA GPU backend for PTO Tile Lib, centered on DGX Spark / GB10 (sm121), and adds a real GPU test lane.

Included in this PR

initial CUDA GPU backend scaffolding and dispatch wiring
sm121 matmul fast-path groundwork
tensor-core WMMA matmul fast paths for half / bf16 on sm121
extended TMATMUL, TMATMUL_ACC, TMATMUL_BIAS, TMATMUL_MX, and TGEMV_MX GPU coverage
standalone CUDA correctness test lane under tests/gpu/st
lightweight GB10 matmul microbenchmark
GPU-specific swizzle tile layout (SLayout::GpuSwizzle128B) that is intentionally separate from NPU boxed layouts
larger 64x64x64 GEMM correctness tests for half / bf16

Notes

current float matmul still uses an inline-PTX FMA fallback path
current MX wrappers accept the scale tiles but reuse the existing GPU matmul path; scale semantics are not fully modeled yet
current GPU swizzle layout is groundwork for future shared-memory / tensor-core-friendly paths and is not yet consumed by the sm121 matmul fast path

Validation

Executed on the target GB10 / DGX Spark environment:

cmake --build build/tests/gpu-st -j4
ctest --output-on-failure
./build/tests/gpu-st/testcase/pto_gpu_perf/pto_gpu_perf

Representative benchmark signal from GB10 (64x64x64, 1 block):

float: ~2.2036 ms, ~0.24 GFLOPS
half: ~0.0082 ms, ~63.86 GFLOPS
bf16: ~0.0082 ms, ~64.03 GFLOPS

zhoubot added 7 commits March 31, 2026 17:37

Add initial CUDA GPU backend skeleton

71162f2

Add GPU correctness tests and sm121 matmul hook

6d82e19

Add sm121 tensor core matmul fast paths

d868d7b

Extend sm121 tensor core matmul coverage

3cbed4a

Add GPU matmul benchmarks and MX wrappers

850c6a3

Add GPU-specific swizzle tile layout

dc63364

Add larger GPU GEMM correctness tests

2ed8f07

jiashu added this to pto project Apr 1, 2026

github-project-automation bot moved this to Todo in pto project Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial NVIDIA GPU backend bring-up#32

Add initial NVIDIA GPU backend bring-up#32
zhoubot wants to merge 7 commits intomainfrom
feat_gpu

zhoubot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhoubot commented Mar 31, 2026

Summary

Included in this PR

Notes

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants