fix(build): only enable SIMDE_BACKEND for non-x86 architectures#254
Merged
fix(build): only enable SIMDE_BACKEND for non-x86 architectures#254
Conversation
- SIMDE_BACKEND was unconditionally enabled for all vectorscan builds, which disables native x86 CPU feature detection and caps performance at SSE2 level - on x86-64, this caused a ~2.5-13x throughput regression vs v0.7.21 because vectorscan's runtime dispatch to SSE4.2/AVX2/AVX512 code paths was completely bypassed - now only enables SIMDE_BACKEND on ARM and other non-x86 architectures where vectorscan genuinely needs the SIMD emulation layer - add benchmark script for reproducing and validating the regression
- GitHub deprecated macos-13 (Intel) runners - macOS x86_64 wheels are now cross-compiled on ARM runners via Rosetta 2, which cibuildwheel handles natively
- vectorscan 5.4.12 uses -march=x86-64-v2 in cflags-x86.cmake and archdetect.cmake, but GCC <11 (manylinux2014 devtoolset) does not recognize this value - patch source at build time to use -march=nehalem which provides the same SSE4.2 baseline and is supported by all GCC versions - only applied when using native x86 backend (not SIMDE_BACKEND)
- use CMAKE_OSX_ARCHITECTURES (target arch) instead of CMAKE_SYSTEM_PROCESSOR (host arch) for SIMDE_BACKEND decision on macOS, so cross-compiling x86_64 on ARM correctly disables SIMDE and builds native x86 vectorscan - forward CMAKE_OSX_ARCHITECTURES to ExternalProject_Add so vectorscan builds for the correct target architecture - handle BSD sed -i syntax difference on macOS for the x86-64-v2 → nehalem patch
- CMake's list handling drops empty string in sed -i "" causing BSD sed to fail with "rename(): No such file or directory" - perl -pi -e works identically on Linux and macOS
- uv 0.10.2 leaks host Python 3.12 stdlib into cibuildwheel venvs on Windows, causing SRE module mismatch and import errors for non-3.12 Python targets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #253 — performance regression in v0.8.0 (and v0.7.23+) caused by unconditionally enabling
SIMDE_BACKEND=ONfor all vectorscan builds.SIMDE_BACKEND=ONreplaces vectorscan's native x86 CPU detection with a stub that reports zero CPU features, disabling all SSE4.2/AVX2/AVX512 code paths and capping performance at SSE2 levelSIMDE_BACKENDon ARM and other non-x86 architectures where vectorscan genuinely needs the SIMD emulation layerBenchmark Results (50 patterns, 500KB documents, Ryzen 7 5800X)
Root Cause
Commit 8df0fcd (v0.7.23) added
-DSIMDE_BACKEND=ONto maximize wheel compatibility across CPU variants. However, SIMDE_BACKEND on x86-64:src/util/arch/x86/cpuid_flags.cwith a SIMDE stub returning 0 (no features)__builtin_constant_p()optimizations in supervector operationsHS_TUNE_FAMILY_GENERICinstead of CPU-specific tuningThe "compatibility" benefit is negligible on x86-64 since SSE4.2 (vectorscan's minimum requirement) has been available since Intel Nehalem (2008).
Test plan