Datapar search fixes#7288
Conversation
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Up to standards ✅🟢 Issues
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR improves SIMD/datapar behavior for search and search_n, and adds performance benchmarks to compare HPX execution policies against the standard library.
Changes:
- Updated SIMD vector pack unaligned load/store to use explicit (element-aligned) pointer-based loads/stores for non-scalar packs.
- Reworked datapar implementations in
search.hppto use SIMD prefiltering (forsearch) and a sliding-window SIMD scan (forsearch_n). - Added new performance benchmarks for
searchandsearch_nand registered them in the performance CMake target list.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/core/execution/include/hpx/execution/traits/detail/simd/vector_pack_load_store.hpp | Adjusts unaligned vector pack load/store to use pointer-based element-aligned operations for non-scalar packs. |
| libs/core/algorithms/include/hpx/parallel/datapar/search.hpp | Introduces new SIMD-accelerated scanning strategies for search and search_n. |
| libs/core/algorithms/tests/performance/benchmark_search.cpp | Adds a benchmark comparing hpx::search policies vs std::search. |
| libs/core/algorithms/tests/performance/benchmark_search_n.cpp | Adds a benchmark comparing hpx::search_n policies vs std::search_n. |
| libs/core/algorithms/tests/performance/CMakeLists.txt | Registers the new benchmark executables. |
|
|
||
| std::cout << "\n-------------- Benchmark Result --------------" | ||
| << std::endl; | ||
| auto fmt = "search ({1}) : {2}(sec)"; |
|
|
||
| std::cout << "\n-------------- Benchmark Result --------------" | ||
| << std::endl; | ||
| auto fmt = "search_n ({1}) : {2}(sec)"; |
| for (int i = 0; i < test_count; ++i) | ||
| { | ||
| std::size_t pos = (n * 3 / 4) + | ||
| (static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count); |
| std::size_t pos = (n * 3 / 4) + | ||
| (static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count); |
| #include <hpx/init.hpp> | ||
| #include <hpx/modules/testing.hpp> | ||
| #include <hpx/program_options.hpp> | ||
|
|
| #include <hpx/init.hpp> | ||
| #include <hpx/modules/testing.hpp> | ||
| #include <hpx/program_options.hpp> | ||
|
|
| // SIMD bulk pass | ||
| for (; i + pack_size <= scan_count; i += pack_size) | ||
| { | ||
| if (tok.was_cancelled(base_idx)) |
| using value_type = | ||
| typename std::iterator_traits<Iter1>::value_type; | ||
| using pack_type = | ||
| hpx::parallel::traits::vector_pack_type_t<value_type>; | ||
| constexpr std::size_t pack_size = | ||
| hpx::parallel::traits::vector_pack_size_v<pack_type>; | ||
|
|
||
| // First-element broadcast: SIMD-scan haystack for matches of | ||
| // needle[0]; only candidates pay the scalar full-needle verify. | ||
| auto const needle0 = HPX_INVOKE(proj2, *s_first); |
|
Can one of the admins verify this patch? |
hkaiser
left a comment
There was a problem hiding this comment.
Could you please apply the same fix for the other datapar backends (if necessary)?
|
@ArivoliR Also, could you please add a test that verifies your unaligned load fix? |
On it! |
Partially fixes #7287
Follow-up to PR #6995
Proposed Changes
vector_pack_load::unaligned(std-experimental-simd backend) to perform a real unaligned load instead of broadcasting*iteracross all lanes; symmetric fix to the store.sequential_search_nto use a sliding-window SIMD scan (broadcast value,none_ofskip,all_ofcarry-extend, lane-by-lane in mixed packs) instead of the per-position scan-with-const_loop_napproach.sequential_searchalong the same lines: SIMD-broadcastneedle[0]across the haystack, scalar-verify the full needle only on lane hits. Drops the per-positionzip_iterator+ nested SIMD dispatch.Any background context you want to provide?
Checklist
Not all points below apply to all pull requests.
Benchmarks
hpx::searchhpx::search_n