Datapar search fixes by ArivoliR · Pull Request #7288 · TheHPXProject/hpx

ArivoliR · 2026-05-21T22:17:29Z

Partially fixes #7287
Follow-up to PR #6995

Proposed Changes

Fix vector_pack_load::unaligned (std-experimental-simd backend) to perform a real unaligned load instead of broadcasting *iter across all lanes; symmetric fix to the store.
Rewrite the datapar sequential_search_n to use a sliding-window SIMD scan (broadcast value, none_of skip, all_of carry-extend, lane-by-lane in mixed packs) instead of the per-position scan-with-const_loop_n approach.
Rewrite the datapar sequential_search along the same lines: SIMD-broadcast needle[0] across the haystack, scalar-verify the full needle only on lane hits. Drops the per-position zip_iterator + nested SIMD dispatch.

Any background context you want to provide?

Follow up to datapar support for hpx::search and search_n #6995 in which I added the initial datapar overloads to search and search_n. While benchmarking I found a correctness issue and a performance regression which is addressed in this PR.

Checklist

Not all points below apply to all pull requests.

I have added a new feature and have added tests to go along with it.
I have fixed a bug and have added a regression test.
I have added a test using random numbers; I have made sure it uses a seed, and that random numbers generated are valid inputs for the tests.

Benchmarks

`hpx::search`

Vector size	Needle	std	seq	par	simd	par_simd	par_simd vs std
10M	8	0.0037	0.0226	0.0014	0.0226	0.0019	1.95×
200M	8	0.0617	0.3842	0.0283	0.3881	0.0262	2.35×
500M	8	0.1537	0.9658	0.0910	0.9642	0.0785	1.96×
500M	32	0.1570	1.0017	0.0945	1.0278	0.0856	1.83×
1B	8	0.3310	2.1169	0.1866	2.1663	0.1482	2.23×

`hpx::search_n`

Vector size	Needle	std	seq	par	simd	par_simd	par_simd vs std
10M	8	0.0012	0.0090	0.0019	0.0094	0.0014	0.86×
200M	8	0.0334	0.1556	0.0333	0.1577	0.0269	1.24×
500M	8	0.0803	0.3928	0.0999	0.4063	0.0812	0.99×
500M	32	0.0614	0.3851	0.0943	0.3842	0.0865	0.71×
1B	8	0.1574	0.7961	0.1972	0.8004	0.1465	1.07×

Signed-off-by: ArivoliR <arivoli2005@gmail.com>

codacy-production · 2026-05-21T22:19:19Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves SIMD/datapar behavior for search and search_n, and adds performance benchmarks to compare HPX execution policies against the standard library.

Changes:

Updated SIMD vector pack unaligned load/store to use explicit (element-aligned) pointer-based loads/stores for non-scalar packs.
Reworked datapar implementations in search.hpp to use SIMD prefiltering (for search) and a sliding-window SIMD scan (for search_n).
Added new performance benchmarks for search and search_n and registered them in the performance CMake target list.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
libs/core/execution/include/hpx/execution/traits/detail/simd/vector_pack_load_store.hpp	Adjusts unaligned vector pack load/store to use pointer-based element-aligned operations for non-scalar packs.
libs/core/algorithms/include/hpx/parallel/datapar/search.hpp	Introduces new SIMD-accelerated scanning strategies for `search` and `search_n`.
libs/core/algorithms/tests/performance/benchmark_search.cpp	Adds a benchmark comparing `hpx::search` policies vs `std::search`.
libs/core/algorithms/tests/performance/benchmark_search_n.cpp	Adds a benchmark comparing `hpx::search_n` policies vs `std::search_n`.
libs/core/algorithms/tests/performance/CMakeLists.txt	Registers the new benchmark executables.

+
+    std::cout << "\n-------------- Benchmark Result --------------"
+              << std::endl;
+    auto fmt = "search ({1}) : {2}(sec)";


+
+    std::cout << "\n-------------- Benchmark Result --------------"
+              << std::endl;
+    auto fmt = "search_n ({1}) : {2}(sec)";


+    for (int i = 0; i < test_count; ++i)
+    {
+        std::size_t pos = (n * 3 / 4) +
+            (static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count);


+        std::size_t pos = (n * 3 / 4) +
+            (static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count);


+#include <hpx/init.hpp>
+#include <hpx/modules/testing.hpp>
+#include <hpx/program_options.hpp>
+


+#include <hpx/init.hpp>
+#include <hpx/modules/testing.hpp>
+#include <hpx/program_options.hpp>
+


+            // SIMD bulk pass
+            for (; i + pack_size <= scan_count; i += pack_size)
+            {
+                if (tok.was_cancelled(base_idx))


+            using value_type =
+                typename std::iterator_traits<Iter1>::value_type;
+            using pack_type =
+                hpx::parallel::traits::vector_pack_type_t<value_type>;
+            constexpr std::size_t pack_size =
+                hpx::parallel::traits::vector_pack_size_v<pack_type>;
+
+            // First-element broadcast: SIMD-scan haystack for matches of
+            // needle[0]; only candidates pay the scalar full-needle verify.
+            auto const needle0 = HPX_INVOKE(proj2, *s_first);


StellarBot · 2026-05-21T22:20:05Z

Can one of the admins verify this patch?

hkaiser

Could you please apply the same fix for the other datapar backends (if necessary)?

hkaiser · 2026-05-22T13:44:26Z

@ArivoliR Also, could you please add a test that verifies your unaligned load fix?

ArivoliR · 2026-05-22T18:18:36Z

@ArivoliR Also, could you please add a test that verifies your unaligned load fix?

On it!

ArivoliR added 4 commits May 22, 2026 03:26

fix vector_pack_load::unaligned for full SIMD packs

6ccbd84

Signed-off-by: ArivoliR <arivoli2005@gmail.com>

rewrite datapar sequential_search_n with sliding-window SIMD

81d9bf9

Signed-off-by: ArivoliR <arivoli2005@gmail.com>

rewrite datapar sequential_search without per-position zip iterator

a0b29b2

Signed-off-by: ArivoliR <arivoli2005@gmail.com>

add benchmarks for hpx::search and hpx::search_n

994fcc9

Signed-off-by: ArivoliR <arivoli2005@gmail.com>

ArivoliR requested a review from hkaiser as a code owner May 21, 2026 22:17

Copilot AI review requested due to automatic review settings May 21, 2026 22:17

Copilot AI reviewed May 21, 2026

View reviewed changes

hkaiser added type: enhancement type: compatibility issue category: algorithms labels May 21, 2026

hkaiser reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Datapar search fixes#7288

Datapar search fixes#7288
ArivoliR wants to merge 4 commits into
TheHPXProject:masterfrom
ArivoliR:datapar-search-fixes

ArivoliR commented May 21, 2026

Uh oh!

codacy-production Bot commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

StellarBot commented May 21, 2026

Uh oh!

hkaiser left a comment

Uh oh!

hkaiser commented May 22, 2026

Uh oh!

ArivoliR commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		std::size_t pos = (n * 3 / 4) +
		(static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count);

Uh oh!

Conversation

ArivoliR commented May 21, 2026

Proposed Changes

Any background context you want to provide?

Checklist

Benchmarks

hpx::search

hpx::search_n

Uh oh!

codacy-production Bot commented May 21, 2026

Up to standards ✅

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

StellarBot commented May 21, 2026

Uh oh!

hkaiser left a comment

Choose a reason for hiding this comment

Uh oh!

hkaiser commented May 22, 2026

Uh oh!

ArivoliR commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`hpx::search`

`hpx::search_n`