Skip to content

Datapar search fixes#7288

Open
ArivoliR wants to merge 4 commits into
TheHPXProject:masterfrom
ArivoliR:datapar-search-fixes
Open

Datapar search fixes#7288
ArivoliR wants to merge 4 commits into
TheHPXProject:masterfrom
ArivoliR:datapar-search-fixes

Conversation

@ArivoliR
Copy link
Copy Markdown
Contributor

Partially fixes #7287
Follow-up to PR #6995

Proposed Changes

  • Fix vector_pack_load::unaligned (std-experimental-simd backend) to perform a real unaligned load instead of broadcasting *iter across all lanes; symmetric fix to the store.
  • Rewrite the datapar sequential_search_n to use a sliding-window SIMD scan (broadcast value, none_of skip, all_of carry-extend, lane-by-lane in mixed packs) instead of the per-position scan-with-const_loop_n approach.
  • Rewrite the datapar sequential_search along the same lines: SIMD-broadcast needle[0] across the haystack, scalar-verify the full needle only on lane hits. Drops the per-position zip_iterator + nested SIMD dispatch.

Any background context you want to provide?

Checklist

Not all points below apply to all pull requests.

  • I have added a new feature and have added tests to go along with it.
  • I have fixed a bug and have added a regression test.
  • I have added a test using random numbers; I have made sure it uses a seed, and that random numbers generated are valid inputs for the tests.

Benchmarks

hpx::search

Vector size Needle std seq par simd par_simd par_simd vs std
10M 8 0.0037 0.0226 0.0014 0.0226 0.0019 1.95×
200M 8 0.0617 0.3842 0.0283 0.3881 0.0262 2.35×
500M 8 0.1537 0.9658 0.0910 0.9642 0.0785 1.96×
500M 32 0.1570 1.0017 0.0945 1.0278 0.0856 1.83×
1B 8 0.3310 2.1169 0.1866 2.1663 0.1482 2.23×

hpx::search_n

Vector size Needle std seq par simd par_simd par_simd vs std
10M 8 0.0012 0.0090 0.0019 0.0094 0.0014 0.86×
200M 8 0.0334 0.1556 0.0333 0.1577 0.0269 1.24×
500M 8 0.0803 0.3928 0.0999 0.4063 0.0812 0.99×
500M 32 0.0614 0.3851 0.0943 0.3842 0.0865 0.71×
1B 8 0.1574 0.7961 0.1972 0.8004 0.1465 1.07×

ArivoliR added 4 commits May 22, 2026 03:26
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
Signed-off-by: ArivoliR <arivoli2005@gmail.com>
@ArivoliR ArivoliR requested a review from hkaiser as a code owner May 21, 2026 22:17
Copilot AI review requested due to automatic review settings May 21, 2026 22:17
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves SIMD/datapar behavior for search and search_n, and adds performance benchmarks to compare HPX execution policies against the standard library.

Changes:

  • Updated SIMD vector pack unaligned load/store to use explicit (element-aligned) pointer-based loads/stores for non-scalar packs.
  • Reworked datapar implementations in search.hpp to use SIMD prefiltering (for search) and a sliding-window SIMD scan (for search_n).
  • Added new performance benchmarks for search and search_n and registered them in the performance CMake target list.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
libs/core/execution/include/hpx/execution/traits/detail/simd/vector_pack_load_store.hpp Adjusts unaligned vector pack load/store to use pointer-based element-aligned operations for non-scalar packs.
libs/core/algorithms/include/hpx/parallel/datapar/search.hpp Introduces new SIMD-accelerated scanning strategies for search and search_n.
libs/core/algorithms/tests/performance/benchmark_search.cpp Adds a benchmark comparing hpx::search policies vs std::search.
libs/core/algorithms/tests/performance/benchmark_search_n.cpp Adds a benchmark comparing hpx::search_n policies vs std::search_n.
libs/core/algorithms/tests/performance/CMakeLists.txt Registers the new benchmark executables.


std::cout << "\n-------------- Benchmark Result --------------"
<< std::endl;
auto fmt = "search ({1}) : {2}(sec)";

std::cout << "\n-------------- Benchmark Result --------------"
<< std::endl;
auto fmt = "search_n ({1}) : {2}(sec)";
Comment on lines +32 to +35
for (int i = 0; i < test_count; ++i)
{
std::size_t pos = (n * 3 / 4) +
(static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count);
Comment on lines +34 to +35
std::size_t pos = (n * 3 / 4) +
(static_cast<std::size_t>(i) * 1000003) % (n / 4 - needle_count);
#include <hpx/init.hpp>
#include <hpx/modules/testing.hpp>
#include <hpx/program_options.hpp>

#include <hpx/init.hpp>
#include <hpx/modules/testing.hpp>
#include <hpx/program_options.hpp>

// SIMD bulk pass
for (; i + pack_size <= scan_count; i += pack_size)
{
if (tok.was_cancelled(base_idx))
Comment on lines +45 to +54
using value_type =
typename std::iterator_traits<Iter1>::value_type;
using pack_type =
hpx::parallel::traits::vector_pack_type_t<value_type>;
constexpr std::size_t pack_size =
hpx::parallel::traits::vector_pack_size_v<pack_type>;

// First-element broadcast: SIMD-scan haystack for matches of
// needle[0]; only candidates pay the scalar full-needle verify.
auto const needle0 = HPX_INVOKE(proj2, *s_first);
@StellarBot
Copy link
Copy Markdown
Collaborator

Can one of the admins verify this patch?

Copy link
Copy Markdown
Contributor

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please apply the same fix for the other datapar backends (if necessary)?

@hkaiser
Copy link
Copy Markdown
Contributor

hkaiser commented May 22, 2026

@ArivoliR Also, could you please add a test that verifies your unaligned load fix?

@ArivoliR
Copy link
Copy Markdown
Contributor Author

@ArivoliR Also, could you please add a test that verifies your unaligned load fix?

On it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vector_pack_load::unaligned broadcasts scalar instead of loading pack

4 participants