[Optimization]: Reduce branching when possible in casting.hpp#117
[Optimization]: Reduce branching when possible in casting.hpp#117zacharyvincze wants to merge 27 commits intoROCm:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the core casting helpers to reduce branching (especially for GPU code paths) and adjusts saturation behavior for some float→integer conversions, alongside adding a small test and extending supported type traits.
Changes:
- Refactors
ScalarSaturateCast/ScalarRangeCastlogic incasting.hppto use more branchless/min-max based clamping and special-case small integer widths. - Extends type traits support to include
long/ulongvectorized types. - Adds a new C++ test covering basic
SaturateCastbehavior and a few limit/vector cases. - Adjusts the GPU block dimensions for the Composite operator kernel launch.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
include/core/detail/casting.hpp |
Refactors saturate/range cast implementations to reduce branching and adjust clamping/rounding logic. |
include/core/detail/type_traits.hpp |
Adds long / ulong to the type-traits macro set. |
tests/roccv/cpp/src/tests/core/detail/test_saturate_cast.cpp |
Introduces a basic unit test for SaturateCast, including a couple of vectorized casts. |
src/op_composite.cpp |
Changes GPU kernel launch block dimensions for the composite operator. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #117 +/- ##
===========================================
- Coverage 74.40% 74.38% -0.02%
===========================================
Files 79 79
Lines 3355 3368 +13
Branches 738 733 -5
===========================================
+ Hits 2496 2505 +9
- Misses 378 379 +1
- Partials 481 484 +3
🚀 New features to boost your workflow:
|
Review: [Optimization] Reduce branching in casting.hppKernel optimization focusing on GPU divergence: Changes:
Assessment: Needs Review - Performance optimization. Reducing branching in GPU kernels is always good for warp efficiency. The fixes for 32/64-bit integer saturation casts sound important - precision issues in type conversion can be subtle bugs. Would benefit from:
Solid optimization PR. |
|
Fixed some issues with assuming that certain types were floats. Now making sure to use the proper function version when computing with doubles to maintain precision. Added some more tests as well to catch more edge cases for Range/Saturate/Static casting. |
Details
casting.hpp. Aims to reduce divergence on GPU kernel implementations.