fix(build): only enable SIMDE_BACKEND for non-x86 architectures by darvid · Pull Request #254 · darvid/python-hyperscan

darvid · 2026-02-11T03:59:57Z

Summary

Fixes #253 — performance regression in v0.8.0 (and v0.7.23+) caused by unconditionally enabling SIMDE_BACKEND=ON for all vectorscan builds.

SIMDE_BACKEND=ON replaces vectorscan's native x86 CPU detection with a stub that reports zero CPU features, disabling all SSE4.2/AVX2/AVX512 code paths and capping performance at SSE2 level
This caused a ~2.5-13x throughput regression depending on workload complexity
Now only enables SIMDE_BACKEND on ARM and other non-x86 architectures where vectorscan genuinely needs the SIMD emulation layer
x86-64 builds use the native backend with runtime CPU feature detection, restoring full performance

Benchmark Results (50 patterns, 500KB documents, Ryzen 7 5800X)

Build	Avg Time/Scan	Throughput
Before (SIMDE_BACKEND=ON)	6.7 ms	70.8 MB/s
After (SIMDE_BACKEND=OFF on x86)	2.6 ms	182.2 MB/s
Reporter's v0.7.19 baseline	3.2 ms	154.3 MB/s

Root Cause

Commit 8df0fcd (v0.7.23) added -DSIMDE_BACKEND=ON to maximize wheel compatibility across CPU variants. However, SIMDE_BACKEND on x86-64:

Replaces src/util/arch/x86/cpuid_flags.c with a SIMDE stub returning 0 (no features)
Disables all higher ISA dispatch (AVX2, AVX512, SSE4.2 string instructions)
Disables __builtin_constant_p() optimizations in supervector operations
Forces HS_TUNE_FAMILY_GENERIC instead of CPU-specific tuning

The "compatibility" benefit is negligible on x86-64 since SSE4.2 (vectorscan's minimum requirement) has been available since Intel Nehalem (2008).

Test plan

All 32 existing tests pass
Benchmark confirms throughput restored to v0.7.19 levels
CI passes on all platforms (x86-64 Linux, macOS, ARM)
Verify ARM wheels still build correctly with SIMDE_BACKEND=ON

- SIMDE_BACKEND was unconditionally enabled for all vectorscan builds, which disables native x86 CPU feature detection and caps performance at SSE2 level - on x86-64, this caused a ~2.5-13x throughput regression vs v0.7.21 because vectorscan's runtime dispatch to SSE4.2/AVX2/AVX512 code paths was completely bypassed - now only enables SIMDE_BACKEND on ARM and other non-x86 architectures where vectorscan genuinely needs the SIMD emulation layer - add benchmark script for reproducing and validating the regression

- GitHub deprecated macos-13 (Intel) runners - macOS x86_64 wheels are now cross-compiled on ARM runners via Rosetta 2, which cibuildwheel handles natively

- vectorscan 5.4.12 uses -march=x86-64-v2 in cflags-x86.cmake and archdetect.cmake, but GCC <11 (manylinux2014 devtoolset) does not recognize this value - patch source at build time to use -march=nehalem which provides the same SSE4.2 baseline and is supported by all GCC versions - only applied when using native x86 backend (not SIMDE_BACKEND)

- use CMAKE_OSX_ARCHITECTURES (target arch) instead of CMAKE_SYSTEM_PROCESSOR (host arch) for SIMDE_BACKEND decision on macOS, so cross-compiling x86_64 on ARM correctly disables SIMDE and builds native x86 vectorscan - forward CMAKE_OSX_ARCHITECTURES to ExternalProject_Add so vectorscan builds for the correct target architecture - handle BSD sed -i syntax difference on macOS for the x86-64-v2 → nehalem patch

- CMake's list handling drops empty string in sed -i "" causing BSD sed to fail with "rename(): No such file or directory" - perl -pi -e works identically on Linux and macOS

- uv 0.10.2 leaks host Python 3.12 stdlib into cibuildwheel venvs on Windows, causing SRE module mismatch and import errors for non-3.12 Python targets

darvid added 6 commits February 10, 2026 22:59

ci(build): replace deprecated macos-13 runners with macos-15

3b9a0d2

- GitHub deprecated macos-13 (Intel) runners - macOS x86_64 wheels are now cross-compiled on ARM runners via Rosetta 2, which cibuildwheel handles natively

build: use perl for x86-64-v2 patch to fix macOS sed compat

5a81c77

- CMake's list handling drops empty string in sed -i "" causing BSD sed to fail with "rename(): No such file or directory" - perl -pi -e works identically on Linux and macOS

ci(build): pin uv to 0.9.x to fix Windows build failures

3edefbd

- uv 0.10.2 leaks host Python 3.12 stdlib into cibuildwheel venvs on Windows, causing SRE module mismatch and import errors for non-3.12 Python targets

darvid merged commit 5bc8cbe into main Feb 11, 2026
60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(build): only enable SIMDE_BACKEND for non-x86 architectures#254

fix(build): only enable SIMDE_BACKEND for non-x86 architectures#254
darvid merged 6 commits intomainfrom
fix/simde-backend-x86-perf-253

darvid commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

darvid commented Feb 11, 2026

Summary

Benchmark Results (50 patterns, 500KB documents, Ryzen 7 5800X)

Root Cause

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant