[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks#2592
[TRITON] Add act_mul without quant (DO_QUANT), model configs, benchmarks#2592Chi-Chu319 wants to merge 18 commits into
Conversation
- Gate DO_QUANT in mxfp4 and fp8-group activation Triton kernels - Add act_mul wrapper and tests in test_activation.py - Wire bench_moe -bench_act_mul and GLM-4.7 / Kimi-K2.5 TP4/EP4 in model_configs.json Made-with: Cursor
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR extends the Triton activation/gating path by adding a non-quantized act_mul API that reuses the existing quantization-capable kernels with quant disabled, plus associated tests, benchmarking support, and new model benchmark configs.
Changes:
- Add
act_mul(no quantization) inaiter.ops.triton.activation, implemented by calling the existing FP8-group-quant kernel withDO_QUANT=False. - Add unit tests for the new no-quant
act_mulbehavior (includingout=support) and add a dedicated-bench_act_mulmode to the MoE benchmark script. - Add GLM-4.7 and Kimi-K2.5 model entries to the Triton benchmark model config JSON.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
aiter/ops/triton/activation.py |
Adds act_mul and wires DO_QUANT into kernel launches. |
aiter/ops/triton/_triton_kernels/activation.py |
Adds DO_QUANT constexpr paths to support quantized vs non-quantized stores. |
op_tests/triton_tests/test_activation.py |
Adds correctness tests for act_mul (no quant). |
op_tests/op_benchmarks/triton/bench_moe.py |
Adds -bench_act_mul benchmarking mode and activation selection. |
op_tests/op_benchmarks/triton/utils/model_configs.json |
Adds glm47fp8 and kimik25 benchmark configurations. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ternally Agent-Logs-Url: https://github.com/ROCm/aiter/sessions/6bb3b38c-af05-43d0-9fb0-86d98eb98852 Co-authored-by: juuso-oskari <40278371+juuso-oskari@users.noreply.github.com>
There was a problem hiding this comment.
LGTM.
The new act_mul no-quant entrypoint and the DO_QUANT constexpr addition look clean. Tests cover 2D and higher-rank inputs (3D/4D), three activations (silu / gelu / gelu_tanh), and the with/without out= paths. CI looks healthy aside from one Triton MI35X Shard 7 flake -- re-ran the failing shard, looks like the same GPU-oversubscription pattern Ali (@azaidy) called out on #2902.
@vgokhale your feedback is reasonable -- esp. the redundancy of DO_QUANT on the fp4 kernel given the new wrapper only drives the fp8 path. @Chi-Chu319 could you address the three points in a follow-up commit (drop DO_QUANT from _act_mul_and_dynamic_mxfp4_quant_kernel, trim the float16/float32 cases from the higher-rank test) so we keep the merge window open? Approving on the basis that the concern is a polish item rather than a correctness issue.
|
Please check if my comments in #3168 (review) are applicable to this PR. Thanks! |
…d kernel logic to ensure quantization is always applied. Adjust test cases to reflect the change in data type parameterization.
… kernel call to use None for batch strides. Adjust stride handling in the _act_mul_and_dynamic_fp8_group_quant_kernel to ensure proper quantization logic.
act_mulinaiter/ops/triton/activation.pyto handle higher-rank inputs ([..., 2*d]) by flattening leading dims before the kernel call and restoring the original batch shape on returntorch_act_mul_refin tests to use...indexing for higher-rank supporttest_act_mul_no_quant_higher_ranktest covering 3D and 4D inputs