[Optimization]: Use dynamic HIP kernel block sizes#122
[Optimization]: Use dynamic HIP kernel block sizes#122zacharyvincze wants to merge 32 commits intoROCm:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces dynamic HIP kernel block size determination to optimize performance across different GPU architectures. It adds helper functions GetMaximumPotentialBlockSize2D and GetGridSize2D to replace hardcoded block sizes throughout the codebase.
Changes:
- Added
GetMaximumPotentialBlockSize2DandGetGridSize2Dhelper functions ininclude/core/detail/hip_utils.hpp - Refactored 14 operator files to use dynamic block sizes instead of hardcoded values
- Cleaned up unused includes and reorganized some headers
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| include/core/detail/hip_utils.hpp | Adds helper functions for dynamic 2D block size calculation using HIP occupancy API |
| include/kernels/host/bnd_box_host.hpp | Adds missing math_vector.hpp include |
| include/kernels/device/bnd_box_device.hpp | Adds missing math_vector.hpp include |
| src/op_warp_perspective.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_thresholding.cpp | Replaces hardcoded (64,16) block size with dynamic calculation per threshold type |
| src/op_rotate.cpp | Replaces hardcoded (32,16) block size with dynamic calculation |
| src/op_resize.cpp | Replaces hardcoded (64,16) block size with dynamic calculation; reformats function map |
| src/op_remap.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_normalize.cpp | Replaces hardcoded (32,8) block size with dynamic calculation; reformats line breaks |
| src/op_non_max_suppression.cpp | Removes unused include (no block size changes as this uses 1D blocks) |
| src/op_histogram.cpp | Removes unused includes (no block size changes) |
| src/op_gamma_contrast.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_flip.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_cvt_color.cpp | Replaces hardcoded (32,16) block size with dynamic calculation per color conversion type |
| src/op_custom_crop.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_copy_make_border.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_convert_to.cpp | Replaces hardcoded (64,16) block size with dynamic calculation; reformats function signatures |
| src/op_composite.cpp | Replaces hardcoded (64,16) block size with dynamic calculation |
| src/op_bnd_box.cpp | Replaces hardcoded (32,32) block size with dynamic calculation |
| src/op_bilateral_filter.cpp | Replaces hardcoded (8,8) block size with dynamic calculation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #122 +/- ##
===========================================
- Coverage 74.40% 74.39% -0.01%
===========================================
Files 79 79
Lines 3355 3381 +26
Branches 738 740 +2
===========================================
+ Hits 2496 2515 +19
- Misses 378 382 +4
- Partials 481 484 +3
🚀 New features to boost your workflow:
|
| @@ -1,2 +1,2 @@ | |||
| /** | |||
| Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved. | |||
There was a problem hiding this comment.
Update (c) year to 2026 here and other places.
…aryvincze/rocCV into zv/optimization/dynamic-block-sizes
Review: [Optimization] Dynamic HIP kernel block sizesPerformance optimization for GPU kernel launches: What's changed:
Assessment: APPROVED - Performance improvement with no API changes. Already approved, this is a solid optimization. Removing hard-coded block sizes improves portability across different GPU architectures (MI series, RDNA, etc.). The approach of querying wavefront size at runtime is the right way to handle this. Nice cleanup of magic numbers throughout the codebase. |
…aryvincze/rocCV into zv/optimization/dynamic-block-sizes
PR Description
Notes