Skip to content

[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks#2592

Open
Chi-Chu319 wants to merge 18 commits into
mainfrom
feature/act-mul-no-quant
Open

[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks#2592
Chi-Chu319 wants to merge 18 commits into
mainfrom
feature/act-mul-no-quant

Conversation

@Chi-Chu319
Copy link
Copy Markdown
Contributor

@Chi-Chu319 Chi-Chu319 commented Apr 2, 2026

  • Fix act_mul in aiter/ops/triton/activation.py to handle higher-rank inputs ([..., 2*d]) by flattening leading dims before the kernel call and restoring the original batch shape on return
  • Update docstring to document the reshape behavior
  • Fix torch_act_mul_ref in tests to use ... indexing for higher-rank support
  • Add test_act_mul_no_quant_higher_rank test covering 3D and 4D inputs
  • Run parallel validation

- Gate DO_QUANT in mxfp4 and fp8-group activation Triton kernels
- Add act_mul wrapper and tests in test_activation.py
- Wire bench_moe -bench_act_mul and GLM-4.7 / Kimi-K2.5 TP4/EP4 in model_configs.json

Made-with: Cursor
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2592 --add-label <label>

@Chi-Chu319 Chi-Chu319 mentioned this pull request Apr 2, 2026
Chi-Chu319 and others added 3 commits April 2, 2026 14:56
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@juuso-oskari juuso-oskari marked this pull request as ready for review April 7, 2026 09:42
@juuso-oskari juuso-oskari requested review from a team and Copilot April 7, 2026 09:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the Triton activation/gating path by adding a non-quantized act_mul API that reuses the existing quantization-capable kernels with quant disabled, plus associated tests, benchmarking support, and new model benchmark configs.

Changes:

  • Add act_mul (no quantization) in aiter.ops.triton.activation, implemented by calling the existing FP8-group-quant kernel with DO_QUANT=False.
  • Add unit tests for the new no-quant act_mul behavior (including out= support) and add a dedicated -bench_act_mul mode to the MoE benchmark script.
  • Add GLM-4.7 and Kimi-K2.5 model entries to the Triton benchmark model config JSON.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
aiter/ops/triton/activation.py Adds act_mul and wires DO_QUANT into kernel launches.
aiter/ops/triton/_triton_kernels/activation.py Adds DO_QUANT constexpr paths to support quantized vs non-quantized stores.
op_tests/triton_tests/test_activation.py Adds correctness tests for act_mul (no quant).
op_tests/op_benchmarks/triton/bench_moe.py Adds -bench_act_mul benchmarking mode and activation selection.
op_tests/op_benchmarks/triton/utils/model_configs.json Adds glm47fp8 and kimik25 benchmark configurations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread aiter/ops/triton/activation.py Outdated
Comment thread aiter/ops/triton/activation.py Outdated
juuso-oskari and others added 3 commits April 7, 2026 09:55
…ternally

Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/6bb3b38c-af05-43d0-9fb0-86d98eb98852

Co-authored-by: juuso-oskari <40278371+juuso-oskari@users.noreply.github.com>
juuso-oskari
juuso-oskari previously approved these changes Apr 7, 2026
Copy link
Copy Markdown
Contributor

@juuso-oskari juuso-oskari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cagrikymk cagrikymk changed the title Add act_mul without quant (DO_QUANT), model configs, benchmarks [TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks Apr 7, 2026
cagrikymk
cagrikymk previously approved these changes Apr 7, 2026
Comment thread aiter/ops/triton/activation.py Outdated
Comment thread op_tests/triton_tests/test_activation.py Outdated
Comment thread op_tests/triton_tests/test_activation.py Outdated
Comment thread aiter/ops/triton/_triton_kernels/activation.py
ChuanLi1101
ChuanLi1101 previously approved these changes May 14, 2026
Copy link
Copy Markdown
Contributor

@ChuanLi1101 ChuanLi1101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

The new act_mul no-quant entrypoint and the DO_QUANT constexpr addition look clean. Tests cover 2D and higher-rank inputs (3D/4D), three activations (silu / gelu / gelu_tanh), and the with/without out= paths. CI looks healthy aside from one Triton MI35X Shard 7 flake -- re-ran the failing shard, looks like the same GPU-oversubscription pattern Ali (@azaidy) called out on #2902.

@vgokhale your feedback is reasonable -- esp. the redundancy of DO_QUANT on the fp4 kernel given the new wrapper only drives the fp8 path. @Chi-Chu319 could you address the three points in a follow-up commit (drop DO_QUANT from _act_mul_and_dynamic_mxfp4_quant_kernel, trim the float16/float32 cases from the higher-rank test) so we keep the merge window open? Approving on the basis that the concern is a polish item rather than a correctness issue.

@brunomazzottiamd
Copy link
Copy Markdown
Contributor

Please check if my comments in #3168 (review) are applicable to this PR. Thanks!

Chi-Chu319 and others added 3 commits May 15, 2026 07:44
…d kernel logic to ensure quantization is always applied. Adjust test cases to reflect the change in data type parameterization.
@Chi-Chu319 Chi-Chu319 dismissed stale reviews from cagrikymk and ChuanLi1101 via 85941be May 15, 2026 05:07
… kernel call to use None for batch strides. Adjust stride handling in the _act_mul_and_dynamic_fp8_group_quant_kernel to ensure proper quantization logic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants