fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC#205
fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC#205thedonmon wants to merge 4 commits intoalibaba:mainfrom
Conversation
The ailego math kernels use FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, vld1q_f16, etc.) which require the +fp16 architecture extension on GCC. Apple Clang enables FP16 by default on ARM64, but GCC does not — it requires explicit -march=armv8.X-a+fp16. This patch: - Updates _detect_armv8_best() to probe for +fp16 support and append it to the detected march flag when available - Updates all explicit ENABLE_ARMV8.X options to include +fp16 Without this fix, building on Linux ARM64 with GCC fails with: "target specific option mismatch" for FP16 NEON intrinsics in src/ailego/math/*_fp16.cc
Tests the cmake +fp16 fix on: - linux-arm64 (ubuntu-24.04-arm) — the platform that was broken - linux-x64 (ubuntu-24.04) — regression check - macos-arm64 (macos-15) — regression check
Greptile SummaryThis PR fixes a Linux ARM64 GCC build failure by adding the The auto-detect path ( Confidence Score: 2/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[CMake Configure Start] --> B{AUTO_DETECT_ARCH?}
B -- Yes --> C{CMAKE_SYSTEM_PROCESSOR\naarch64/arm64?}
C -- No --> D[_detect_x86_best]
C -- Yes --> E[_detect_armv8_best]
E --> F["Loop: armv8.6-a → armv8-a"]
F --> G{check_c_compiler_flag\n-march=ver}
G -- Not Supported --> F
G -- Supported --> H{check_c_compiler_flag\n-march=ver+fp16}
H -- Supported --> I["_march_flag = -march=ver+fp16\nSTATUS: FP16 NEON enabled"]
H -- Not Supported --> J["_march_flag = -march=ver\nSTATUS: FP16 not supported"]
I --> K[Append to CMAKE_C_FLAGS\nand CMAKE_CXX_FLAGS]
J --> K
K --> Z[Done]
B -- No --> L{ENABLE_ARMV8.xA set?}
L -- No --> Z
L -- Yes --> M["add_arch_flag(-march=armv8.x-a+fp16)"]
M --> N{check_c_compiler_flag\n-march=armv8.x-a+fp16}
N -- Supported --> O[Append to FLAGS]
N -- Not Supported\nand option is ON --> P["FATAL_ERROR ⚠️\nNo fallback to base flag"]
O --> Z
Last reviewed commit: 2b5eb87 |
| if(ENABLE_ARMV8.6A) | ||
| add_arch_flag("-march=armv8.6-a" ARMV86A ENABLE_ARMV8.6A) | ||
| add_arch_flag("-march=armv8.6-a+fp16" ARMV86A ENABLE_ARMV8.6A) | ||
| endif() | ||
| if(ENABLE_ARMV8.5A) | ||
| add_arch_flag("-march=armv8.5-a" ARMV85A ENABLE_ARMV8.5A) | ||
| add_arch_flag("-march=armv8.5-a+fp16" ARMV85A ENABLE_ARMV8.5A) | ||
| endif() | ||
| if(ENABLE_ARMV8.4A) | ||
| add_arch_flag("-march=armv8.4-a" ARMV84A ENABLE_ARMV8.4A) | ||
| add_arch_flag("-march=armv8.4-a+fp16" ARMV84A ENABLE_ARMV8.4A) | ||
| endif() | ||
| if(ENABLE_ARMV8.3A) | ||
| add_arch_flag("-march=armv8.3-a" ARMV83A ENABLE_ARMV8.3A) | ||
| add_arch_flag("-march=armv8.3-a+fp16" ARMV83A ENABLE_ARMV8.3A) | ||
| endif() | ||
| if(ENABLE_ARMV8.2A) | ||
| add_arch_flag("-march=armv8.2-a" ARMV82A ENABLE_ARMV8.2A) | ||
| add_arch_flag("-march=armv8.2-a+fp16" ARMV82A ENABLE_ARMV8.2A) | ||
| endif() | ||
| if(ENABLE_ARMV8.1A) | ||
| add_arch_flag("-march=armv8.1-a" ARMV81A ENABLE_ARMV8.1A) | ||
| add_arch_flag("-march=armv8.1-a+fp16" ARMV81A ENABLE_ARMV8.1A) | ||
| endif() | ||
| if(ENABLE_ARMV8A) | ||
| add_arch_flag("-march=armv8-a" ARMV8A ENABLE_ARMV8A) | ||
| add_arch_flag("-march=armv8-a+fp16" ARMV8A ENABLE_ARMV8A) | ||
| endif() |
There was a problem hiding this comment.
The manual ENABLE_ARMV8.X options (lines 197–217) unconditionally hardcode +fp16 and pass the combined flag to add_arch_flag. This creates a regression for users who explicitly enable one of these options on a compiler that supports the base architecture but not the +fp16 extension (e.g., GCC ≤7 or older cross-compilation toolchains).
When check_c_compiler_flag rejects the combined flag, add_arch_flag will emit:
FATAL_ERROR "Compiler does not support required flag: '-march=armv8.2-a+fp16' for ENABLE_ARMV8.2A"
The auto-detect path (_detect_armv8_best, lines 77–103) handles this correctly by probing the base arch first, then probing +fp16 separately, and falling back gracefully. The manual path should follow the same pattern:
if(ENABLE_ARMV8.2A)
check_c_compiler_flag("-march=armv8.2-a+fp16" _COMP_SUPP_ARMV82A_fp16)
if(_COMP_SUPP_ARMV82A_fp16)
add_arch_flag("-march=armv8.2-a+fp16" ARMV82A ENABLE_ARMV8.2A)
else()
add_arch_flag("-march=armv8.2-a" ARMV82A ENABLE_ARMV8.2A)
endif()
endif()(Repeat for each ENABLE_ARMV8.XA option.)
There was a problem hiding this comment.
Perfect! The changes look great. You've implemented exactly what I suggested:
- Created the
add_arch_flag_with_fp16macro (lines 90-99) that probes for+fp16support and falls back to the base architecture - Updated all manual ARM options (lines 206-222) to use this new macro instead of hardcoding
+fp16 - Added helpful status messages indicating whether FP16 is enabled or not
This prevents the FATAL_ERROR for users with older compilers (like GCC ≤7) that support ARMv8.X but not +fp16, while still enabling FP16 when available. The solution is clean, DRY, and matches the pattern already used in the auto-detect path.
✅ Issue resolved!
Address review feedback: the manual ENABLE_ARMV8.X options were hardcoding +fp16 without a fallback, which would cause FATAL_ERROR on compilers that support the base arch but not +fp16 (older GCC, cross-compilation toolchains). Adds add_arch_flag_with_fp16 macro that probes for +fp16 support first, then falls back to the base -march flag — matching the same pattern used in _detect_armv8_best() for auto-detection.
Should re-review, comments addressed. |
|
thanks @thedonmon . it looks do-able. as we don't have arm+fp16 handy, the verification may take some time. |
Summary
Fixes the Linux ARM64 GCC build failure caused by missing
+fp16in-marchflags.The
ailegomath kernels use FP16 NEON intrinsics (vfmaq_f16,vsubq_f16,vld1q_f16, etc.) which require the+fp16architecture extension on GCC. Apple Clang enables FP16 by default on ARM64, but GCC does not — it requires explicit-march=armv8.X-a+fp16.Without this fix, building on Linux ARM64 with GCC fails with:
in
src/ailego/math/*_fp16.ccfiles.Changes
cmake/option.cmake: Updated_detect_armv8_best()to probe for+fp16compiler support and append it to the detected march flag when availableENABLE_ARMV8.Xoptions to include+fp16Testing
Verified on all 3 platforms via CI on this fork:
ubuntu-24.04-arm): Build + C++ tests + Python tests — PASS (was failing before this fix)ubuntu-24.04): Build + C++ tests + Python tests — PASS (no regression)macos-15): Build + C++ tests + Python tests — PASS (no regression)CI run: https://github.com/thedonmon/zvec/actions/runs/22819664428
Note on PR #193
PR #193 (
refactor/march_based_reorganization) is refactoring the same area with per-ISA file dispatch. That PR has a bug in the NEON path:MATH_MARCH_FLAG_NEONis referenced insrc/ailego/CMakeLists.txtbut never defined — so the NEON files get no march flag at all. This fix provides the correct approach: detect+fp16support and append it to the march flag.