[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks by Chi-Chu319 · Pull Request #2592 · ROCm/aiter

Chi-Chu319 · 2026-04-02T11:54:30Z

Fix act_mul in aiter/ops/triton/activation.py to handle higher-rank inputs ([..., 2*d]) by flattening leading dims before the kernel call and restoring the original batch shape on return
Update docstring to document the reshape behavior
Fix torch_act_mul_ref in tests to use ... indexing for higher-rank support
Add test_act_mul_no_quant_higher_rank test covering 3D and 4D inputs
Run parallel validation

- Gate DO_QUANT in mxfp4 and fp8-group activation Triton kernels - Add act_mul wrapper and tests in test_activation.py - Wire bench_moe -bench_act_mul and GLM-4.7 / Kimi-K2.5 TP4/EP4 in model_configs.json Made-with: Cursor

github-actions · 2026-04-02T11:54:58Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2592 --add-label <label>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Copilot

Pull request overview

This PR extends the Triton activation/gating path by adding a non-quantized act_mul API that reuses the existing quantization-capable kernels with quant disabled, plus associated tests, benchmarking support, and new model benchmark configs.

Changes:

Add act_mul (no quantization) in aiter.ops.triton.activation, implemented by calling the existing FP8-group-quant kernel with DO_QUANT=False.
Add unit tests for the new no-quant act_mul behavior (including out= support) and add a dedicated -bench_act_mul mode to the MoE benchmark script.
Add GLM-4.7 and Kimi-K2.5 model entries to the Triton benchmark model config JSON.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`aiter/ops/triton/activation.py`	Adds `act_mul` and wires `DO_QUANT` into kernel launches.
`aiter/ops/triton/_triton_kernels/activation.py`	Adds `DO_QUANT` constexpr paths to support quantized vs non-quantized stores.
`op_tests/triton_tests/test_activation.py`	Adds correctness tests for `act_mul` (no quant).
`op_tests/op_benchmarks/triton/bench_moe.py`	Adds `-bench_act_mul` benchmarking mode and activation selection.
`op_tests/op_benchmarks/triton/utils/model_configs.json`	Adds `glm47fp8` and `kimik25` benchmark configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

into feature/act-mul-no-quant

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…ternally Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/6bb3b38c-af05-43d0-9fb0-86d98eb98852 Co-authored-by: juuso-oskari <40278371+juuso-oskari@users.noreply.github.com>

juuso-oskari

LGTM

ChuanLi1101

LGTM.

The new act_mul no-quant entrypoint and the DO_QUANT constexpr addition look clean. Tests cover 2D and higher-rank inputs (3D/4D), three activations (silu / gelu / gelu_tanh), and the with/without out= paths. CI looks healthy aside from one Triton MI35X Shard 7 flake -- re-ran the failing shard, looks like the same GPU-oversubscription pattern Ali (@azaidy) called out on #2902.

@vgokhale your feedback is reasonable -- esp. the redundancy of DO_QUANT on the fp4 kernel given the new wrapper only drives the fp8 path. @Chi-Chu319 could you address the three points in a follow-up commit (drop DO_QUANT from _act_mul_and_dynamic_mxfp4_quant_kernel, trim the float16/float32 cases from the higher-rank test) so we keep the merge window open? Approving on the basis that the concern is a polish item rather than a correctness issue.

brunomazzottiamd · 2026-05-14T15:27:35Z

Please check if my comments in #3168 (review) are applicable to this PR. Thanks!

…d kernel logic to ensure quantization is always applied. Adjust test cases to reflect the change in data type parameterization.

into feature/act-mul-no-quant

… kernel call to use None for batch strides. Adjust stride handling in the _act_mul_and_dynamic_fp8_group_quant_kernel to ensure proper quantization logic.

Add act_mul without quant (DO_QUANT), model configs, benchmarks

8aeaf2a

- Gate DO_QUANT in mxfp4 and fp8-group activation Triton kernels - Add act_mul wrapper and tests in test_activation.py - Wire bench_moe -bench_act_mul and GLM-4.7 / Kimi-K2.5 TP4/EP4 in model_configs.json Made-with: Cursor

Chi-Chu319 requested review from azaidy, cagrikymk, juuso-oskari, k50112113 and lburzawa April 2, 2026 11:54

Chi-Chu319 mentioned this pull request Apr 2, 2026

silu_mul_fused kernel #2578

Merged

Chi-Chu319 and others added 3 commits April 2, 2026 14:56

Update op_tests/op_benchmarks/triton/bench_moe.py

9c22cd2

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update aiter/ops/triton/activation.py

4bbd3d6

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Merge branch 'main' into feature/act-mul-no-quant

3f06beb

juuso-oskari marked this pull request as ready for review April 7, 2026 09:42

juuso-oskari requested review from a team and Copilot April 7, 2026 09:42

Copilot started reviewing on behalf of juuso-oskari April 7, 2026 09:44 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread aiter/ops/triton/activation.py Outdated

Comment thread aiter/ops/triton/activation.py Outdated

juuso-oskari and others added 3 commits April 7, 2026 09:55

no need to move the b load after a activation

048736c

Merge branch 'feature/act-mul-no-quant' of https://github.com/ROCm/aiter

8c333b9

into feature/act-mul-no-quant

Update aiter/ops/triton/activation.py

1357f0e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot started work on behalf of juuso-oskari April 7, 2026 10:02 View session

fix: act_mul handles higher-rank inputs by flattening/unflattening in…

775a25a

…ternally Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/6bb3b38c-af05-43d0-9fb0-86d98eb98852 Co-authored-by: juuso-oskari <40278371+juuso-oskari@users.noreply.github.com>

Copilot finished work on behalf of juuso-oskari April 7, 2026 10:14

juuso-oskari previously approved these changes Apr 7, 2026

View reviewed changes

do mul in b dtype and use more universal x_out_ptr name

85c792d

juuso-oskari dismissed their stale review via 85c792d April 7, 2026 11:08

juuso-oskari added 2 commits April 7, 2026 11:10

black

8e90c8c

revert back the b dtype mul

0f07849

cagrikymk changed the title ~~Add act_mul without quant (DO_QUANT), model configs, benchmarks~~ [TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks Apr 7, 2026

cagrikymk previously approved these changes Apr 7, 2026

View reviewed changes

Comment thread aiter/ops/triton/activation.py Outdated

Merge branch 'main' into feature/act-mul-no-quant

53171cd

vgokhale reviewed May 13, 2026

View reviewed changes

Comment thread op_tests/triton_tests/test_activation.py Outdated

vgokhale reviewed May 13, 2026

View reviewed changes

Comment thread op_tests/triton_tests/test_activation.py Outdated

vgokhale reviewed May 13, 2026

View reviewed changes

Comment thread aiter/ops/triton/_triton_kernels/activation.py

ChuanLi1101 previously approved these changes May 14, 2026

View reviewed changes

Chi-Chu319 and others added 3 commits May 15, 2026 07:44

Merge branch 'main' into feature/act-mul-no-quant

daa2a75

Remove DO_QUANT parameter from activation functions and update relate…

321bcf5

…d kernel logic to ensure quantization is always applied. Adjust test cases to reflect the change in data type parameterization.

Merge branch 'feature/act-mul-no-quant' of https://github.com/ROCm/aiter

85941be

into feature/act-mul-no-quant

Chi-Chu319 dismissed stale reviews from cagrikymk and ChuanLi1101 via 85941be May 15, 2026 05:07

Chi-Chu319 added 2 commits May 15, 2026 05:11

renamed x_out_ptr back to x_fp4_ptr

9c90e54

move fp8_dtype = aiter.dtypes.fp8 after imports

3a0c78f

Chi-Chu319 mentioned this pull request May 15, 2026

[TRITON] gfx1201: gemm_a8w8 tuning configs (Mistral-3 / Qwen3 shapes) #3168

Open

3 tasks

Refactor act_mul function to remove unused dummy_bs tensor and update…

b4a4117

… kernel call to use None for batch strides. Adjust stride handling in the _act_mul_and_dynamic_fp8_group_quant_kernel to ensure proper quantization logic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks#2592

[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks#2592
Chi-Chu319 wants to merge 18 commits into
mainfrom
feature/act-mul-no-quant

Chi-Chu319 commented Apr 2, 2026 •

edited by Copilot AI

Loading

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

juuso-oskari left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanLi1101 left a comment •

edited

Loading

Uh oh!

brunomazzottiamd commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

Chi-Chu319 commented Apr 2, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 2, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

juuso-oskari left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChuanLi1101 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brunomazzottiamd commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Chi-Chu319 commented Apr 2, 2026 •

edited by Copilot AI

Loading

ChuanLi1101 left a comment •

edited

Loading