Skip to content

Refactor: migrate ABG and HBG examples to SceneTestCase under tests/st/#541

Merged
ChaoWao merged 1 commit intohw-native-sys:mainfrom
doraemonmj:pytest
Apr 15, 2026
Merged

Refactor: migrate ABG and HBG examples to SceneTestCase under tests/st/#541
ChaoWao merged 1 commit intohw-native-sys:mainfrom
doraemonmj:pytest

Conversation

@doraemonmj
Copy link
Copy Markdown
Contributor

@doraemonmj doraemonmj commented Apr 13, 2026

  • Move aicpu_build_graph examples (vector_example, bgemm, paged_attention) to tests/st/a2a3/aicpu_build_graph/ as @scene_test classes
  • Move host_build_graph examples (vector_example, matmul, bgemm, paged_attention) to tests/st/a2a3/host_build_graph/ as @scene_test classes
  • Convert paged_attention_unroll from golden.py to test_*.py format
  • Merge host_build_graph/paged_attention sim-scale cases from examples/ with production-scale cases from tests/st/ into single test file
  • Delete golden.py, kernel_config.py, READMEs, and docs from migrated example directories; keep C++ kernel sources under tests/st/
  • All cases include a2a3sim platform; config params preserved exactly

全量 Case 表

# runtime 用例名 Case 位置 平台 dtype 精度 (R/A) block_dim thread_num Kernels (func_id) 输入数据 输出数据 迁移状态
1 host bgemm 默认 examples/.../host_build_graph/bgemm/ sim+onboard fp32 1e-3/1e-3 - - AIC: gemm_tile(0); AIV: tile_add(1) A:[1,4,4,64,64] fp32, B:[1,4,4,64,64] fp32 C:[1,4,4,64,64] fp32
2 host vector_example 默认 examples/.../host_build_graph/vector_example/ sim+onboard fp32 1e-5/1e-5 3 3 AIV: add(0), add_scalar(1), mul(2) a:[16384] fp32, b:[16384] fp32 f:[16384] fp32
3 host matmul 默认 examples/.../host_build_graph/matmul/ sim+onboard fp16 1e-2/1e-2 3 3 AIV: log_sqrt(0), add_exp(2); AIC: matmul(1) a:[16384] fp16, w1:[16384] fp16, w2:[16384] fp16 f:[16384] fp32
4 host paged_attention small1 examples/.../host_build_graph/paged_attention/ onboard fp16 1e-2/1e-2 3 3 AIC: qk_matmul(0), pv_matmul(2); AIV: softmax_prepare(1), online_update(3) query:[1,16,16], kc:[1,16,1,16], vc:[1,16,1,16], bt:[1,16] i32, cl:[1] i32(=16), scale:1.0 out:[1,16,16] fp32
5 host paged_attention small2 (manual) examples/.../host_build_graph/paged_attention/ onboard fp16 1e-2/1e-2 3 3 #4 query:[1,16,16], kc:[4,16,1,16], vc:[4,16,1,16], bt:[1,16] i32, cl:[1] i32(=64), scale:1.0 out:[1,16,16] fp32
6 host paged_attention (st) Case1 tests/st/.../host_build_graph/paged_attention/ onboard bf16 1e-3/1e-3 24 3 AIC: qk_matmul(0), pv_matmul(2); AIV: softmax_prepare(1), online_update(3) query:[256,16,128], kc:[16384,128,1,128], vc:[16384,128,1,128], bt:[256,256] i32, cl:[256] i32(=8100), scale:1.0 out:[256,16,128] fp32
7 host paged_attention (st) Case2 tests/st/.../host_build_graph/paged_attention/ onboard bf16 1e-3/1e-3 24 3 #6 query:[64,64,128], kc:[8192,64,1,128], vc:[8192,64,1,128], bt:[64,512] i32, cl:[64] i32(=8150), scale:1.0 out:[64,64,128] fp32
8 aicpu bgemm 默认 examples/.../aicpu_build_graph/bgemm/ sim+onboard fp32 1e-3/1e-3 3 4 AIC: gemm_tile(0); AIV: tile_add(1) A:[1,4,4,64,64] fp32, B:[1,4,4,64,64] fp32 C:[1,4,4,64,64] fp32
9 aicpu vector_example 默认 examples/.../aicpu_build_graph/vector_example/ sim+onboard fp32 1e-5/1e-5 3 4 AIV: add(0), add_scalar(1), mul(2) a:[16384] fp32, b:[16384] fp32 f:[16384] fp32
10 aicpu paged_attention case1 examples/.../aicpu_build_graph/paged_attention/ onboard bf16 1e-3/1e-3 24 4 AIC: qk_matmul(0), pv_matmul(2), hub(4); AIV: softmax_prepare(1), online_update(3), hub(5) query:[256,16,128], kc:[16384,128,1,128], vc:[16384,128,1,128], bt:[256,256] i32, cl:[256] i32(=8192), scale:1.0 out:[256,16,128] fp32
11 aicpu paged_attention case2 examples/.../aicpu_build_graph/paged_attention/ onboard bf16 1e-3/1e-3 24 4 #10 query:[64,64,128], kc:[8192,64,1,128], vc:[8192,64,1,128], bt:[64,512] i32, cl:[64] i32(=8192), scale:1.0 out:[64,64,128] fp32
12 aicpu pa_unroll (st) Case1 tests/st/.../aicpu_build_graph/paged_attention_unroll/ onboard bf16 1e-3/1e-3 24 4 #10 #10 #10
13 aicpu pa_unroll (st) Case2 tests/st/.../aicpu_build_graph/paged_attention_unroll/ onboard bf16 1e-3/1e-3 24 4 #10 #11 #11
14 aicpu pa_unroll (st) Case3 tests/st/.../aicpu_build_graph/paged_attention_unroll/ onboard bf16 1e-3/1e-3 24 4 #10 query:[64,64,256], kc:[8192,64,1,256], vc:[8192,64,1,256], bt:[64,512] i32, cl:[64] i32(=8192), scale:1.0 out:[64,64,256] fp32

迁移后变更

examples/a2a3/ 已全部删除,所有用例统一迁移到 tests/st/a2a3/ 下。
paged_attention_small/ 已合并入 paged_attention/ (small1/small2 作为子 case)。

# runtime 用例名 Case 迁移后位置 变更字段 迁移前 迁移后 迁移状态
1 host bgemm default tests/st/.../host_build_graph/bgemm/ 参数 run_example的默认aicpu 3, blockdim 24 aicpu 3,blockdim 24 ✅ 已迁移
2 host vector_example default tests/st/.../host_build_graph/vector_example/ (无变更) ✅ 已迁移
3 host matmul default tests/st/.../host_build_graph/matmul/ (无变更) ✅ 已迁移
4 host paged_attention small1 tests/st/.../host_build_graph/paged_attention/ 平台, dtype, 精度 fp16, 1e-2/1e-2 bf16, 1e-3/1e-3 ,kernel 调整 ✅ 已合并
5 host paged_attention small2 (manual) tests/st/.../host_build_graph/paged_attention/ 平台, dtype, 精度 fp16, 1e-2/1e-2 bf16, 1e-3/1e-3 ,合并kernel调整 ✅ 已合并
6 host paged_attention Case1 tests/st/.../host_build_graph/paged_attention/ (无变更) ✅ 已迁移
7 host paged_attention Case2 (manual) tests/st/.../host_build_graph/paged_attention/ (无变更) ✅ 已迁移
8 aicpu bgemm default tests/st/.../aicpu_build_graph/bgemm/ (无变更) ✅ 已迁移
9 aicpu vector_example default tests/st/.../aicpu_build_graph/vector_example/ (无变更) ✅ 已迁移
10 aicpu paged_attention case1 tests/st/.../aicpu_build_graph/paged_attention/ (无变更) ✅ 已迁移
11 aicpu paged_attention case2 tests/st/.../aicpu_build_graph/paged_attention/ (无变更) ✅ 已迁移
12 aicpu pa_unroll Case1 tests/st/.../aicpu_build_graph/paged_attention_unroll/ (无变更) ✅ 已迁移
13 aicpu pa_unroll Case2 (manual) tests/st/.../aicpu_build_graph/paged_attention_unroll/ (无变更) ✅ 已迁移
14 aicpu pa_unroll Case3 (manual) tests/st/.../aicpu_build_graph/paged_attention_unroll/ (无变更) ✅ 已迁移

迁移变更小结

  • examples/a2a3/ 全部删除,统一为 tests/st/a2a3/ 下的 @scene_test
  • examples/host/paged_attention/ 独立目录删除,small1/small2 合并入 paged_attention/ 的 CASES 列表
  • 原 example 的 golden.py + kernel_config.py@scene_test 类中 CALLABLE + CASES + generate_args() + compute_golden()
  • host bgemm 补充了 block_dim=3, thread_num=24 (原 example 未指定,使用原框架默认值)
  • small1/small2 dtype fp16→bf16,精度 1e-2→1e-3 (与大 case 对齐);调整大case的kernel,使其兼容小case
  • 14/14 用例全部完成迁移,无遗漏
  • 由于a2a3的host与aicpu build下无example,会跳过ptoisa的clone,在ci.py的_run_single_platform()添加(args.pto_isa_commit, args.clone_protocol)
  • 修改verify_packaging.sh里用例路径

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors and consolidates several examples (BGEMM, Vector, Matmul, and Paged Attention) into a standardized testing framework using SceneTestCase. It removes legacy example directories and introduces new test scripts in the tests/st/a2a3/ directory. Review feedback identified critical signature mismatches in the TestPagedAttentionUnrollAicpuBuildGraph test class, specifically regarding missing scalar arguments and incorrect argument counts for the orchestration and kernel functions, including aiv_softmax_prepare.cpp and aiv_online_update.cpp.

@doraemonmj doraemonmj force-pushed the pytest branch 8 times, most recently from 8ab1933 to e2580ac Compare April 15, 2026 06:53
- Move aicpu_build_graph and host_build_graph cases from examples/
  to tests/st/, converting golden.py/kernel_config.py entries to
  SceneTestCase format
- Delete obsolete golden.py, kernel_config.py, README, and docs
- Add small-tile dispatch (<16,16,...>) to HBG paged attention
  kernels so the small1 case (head_dim=16, block_size=16) no longer
  segfaults from buffer overruns
@ChaoWao ChaoWao merged commit 638da68 into hw-native-sys:main Apr 15, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants