fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC by thedonmon · Pull Request #205 · alibaba/zvec

thedonmon · 2026-03-08T20:05:33Z

Summary

Fixes the Linux ARM64 GCC build failure caused by missing +fp16 in -march flags.

The ailego math kernels use FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, vld1q_f16, etc.) which require the +fp16 architecture extension on GCC. Apple Clang enables FP16 by default on ARM64, but GCC does not — it requires explicit -march=armv8.X-a+fp16.

Without this fix, building on Linux ARM64 with GCC fails with:

error: inlining failed in call to 'always_inline' ... : target specific option mismatch

in src/ailego/math/*_fp16.cc files.

Changes

cmake/option.cmake: Updated _detect_armv8_best() to probe for +fp16 compiler support and append it to the detected march flag when available
Updated all explicit ENABLE_ARMV8.X options to include +fp16

Testing

Verified on all 3 platforms via CI on this fork:

linux-arm64 (ubuntu-24.04-arm): Build + C++ tests + Python tests — PASS (was failing before this fix)
linux-x64 (ubuntu-24.04): Build + C++ tests + Python tests — PASS (no regression)
macos-arm64 (macos-15): Build + C++ tests + Python tests — PASS (no regression)

CI run: https://github.com/thedonmon/zvec/actions/runs/22819664428

Note on PR #193

PR #193 (refactor/march_based_reorganization) is refactoring the same area with per-ISA file dispatch. That PR has a bug in the NEON path: MATH_MARCH_FLAG_NEON is referenced in src/ailego/CMakeLists.txt but never defined — so the NEON files get no march flag at all. This fix provides the correct approach: detect +fp16 support and append it to the march flag.

The ailego math kernels use FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, vld1q_f16, etc.) which require the +fp16 architecture extension on GCC. Apple Clang enables FP16 by default on ARM64, but GCC does not — it requires explicit -march=armv8.X-a+fp16. This patch: - Updates _detect_armv8_best() to probe for +fp16 support and append it to the detected march flag when available - Updates all explicit ENABLE_ARMV8.X options to include +fp16 Without this fix, building on Linux ARM64 with GCC fails with: "target specific option mismatch" for FP16 NEON intrinsics in src/ailego/math/*_fp16.cc

Tests the cmake +fp16 fix on: - linux-arm64 (ubuntu-24.04-arm) — the platform that was broken - linux-x64 (ubuntu-24.04) — regression check - macos-arm64 (macos-15) — regression check

CLAassistant · 2026-03-08T20:05:39Z

All committers have signed the CLA.

greptile-apps · 2026-03-08T20:09:22Z

Greptile Summary

This PR fixes a Linux ARM64 GCC build failure by adding the +fp16 architecture extension to -march flags so that FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, etc.) used in src/ailego/math/*_fp16.cc compile correctly. Apple Clang enables FP16 by default on ARM64; GCC requires the explicit +fp16 suffix.

The auto-detect path (_detect_armv8_best) correctly probes +fp16 support and gracefully falls back to the base flag when unsupported—this part of the fix is well-designed. However, the manual path (ENABLE_ARMV8.X options) unconditionally hardcodes +fp16 in all ARM march flags without a fallback probe. Because add_arch_flag emits FATAL_ERROR when an explicitly-enabled option's flag is unsupported, any user who sets e.g. -DENABLE_ARMV8.2A=ON on a compiler that supports armv8.2-a but not armv8.2-a+fp16 (older GCC versions) will encounter a fatal build error instead of a successful build with the base flag. The auto-detect path's probe-then-fallback pattern should be applied to the manual path as well.

Confidence Score: 2/5

The auto-detect path is safe, but the manual ARM path creates a regression: users with older GCC or cross-compilation toolchains will hit FATAL_ERROR instead of a graceful fallback.
The auto-detect path's probe-then-fallback logic is correct and well-tested per CI. However, the manual ENABLE_ARMV8.X options unconditionally hardcode +fp16 without probing first or providing a fallback. This breaks a contract with existing users: previously, setting -DENABLE_ARMV8.2A=ON on a compiler supporting the base arch would succeed; now it fails with FATAL_ERROR if the compiler doesn't support +fp16. This is a functional regression for older GCC versions and embedded toolchains, which are common in ARM development.
cmake/option.cmake (lines 197–217): Apply the probe-then-fallback pattern from _detect_armv8_best to all ENABLE_ARMV8.X options.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CMake Configure Start] --> B{AUTO_DETECT_ARCH?}

    B -- Yes --> C{CMAKE_SYSTEM_PROCESSOR\naarch64/arm64?}
    C -- No --> D[_detect_x86_best]
    C -- Yes --> E[_detect_armv8_best]
    E --> F["Loop: armv8.6-a → armv8-a"]
    F --> G{check_c_compiler_flag\n-march=ver}
    G -- Not Supported --> F
    G -- Supported --> H{check_c_compiler_flag\n-march=ver+fp16}
    H -- Supported --> I["_march_flag = -march=ver+fp16\nSTATUS: FP16 NEON enabled"]
    H -- Not Supported --> J["_march_flag = -march=ver\nSTATUS: FP16 not supported"]
    I --> K[Append to CMAKE_C_FLAGS\nand CMAKE_CXX_FLAGS]
    J --> K
    K --> Z[Done]

    B -- No --> L{ENABLE_ARMV8.xA set?}
    L -- No --> Z
    L -- Yes --> M["add_arch_flag(-march=armv8.x-a+fp16)"]
    M --> N{check_c_compiler_flag\n-march=armv8.x-a+fp16}
    N -- Supported --> O[Append to FLAGS]
    N -- Not Supported\nand option is ON --> P["FATAL_ERROR ⚠️\nNo fallback to base flag"]
    O --> Z

_{Last reviewed commit: 2b5eb87}

greptile-apps · 2026-03-08T20:09:26Z

cmake/option.cmake

  if(ENABLE_ARMV8.6A)
-    add_arch_flag("-march=armv8.6-a" ARMV86A ENABLE_ARMV8.6A)
+    add_arch_flag("-march=armv8.6-a+fp16" ARMV86A ENABLE_ARMV8.6A)
  endif()
  if(ENABLE_ARMV8.5A)
-    add_arch_flag("-march=armv8.5-a" ARMV85A ENABLE_ARMV8.5A)
+    add_arch_flag("-march=armv8.5-a+fp16" ARMV85A ENABLE_ARMV8.5A)
  endif()
  if(ENABLE_ARMV8.4A)
-    add_arch_flag("-march=armv8.4-a" ARMV84A ENABLE_ARMV8.4A)
+    add_arch_flag("-march=armv8.4-a+fp16" ARMV84A ENABLE_ARMV8.4A)
  endif()
  if(ENABLE_ARMV8.3A)
-    add_arch_flag("-march=armv8.3-a" ARMV83A ENABLE_ARMV8.3A)
+    add_arch_flag("-march=armv8.3-a+fp16" ARMV83A ENABLE_ARMV8.3A)
  endif()
  if(ENABLE_ARMV8.2A)
-    add_arch_flag("-march=armv8.2-a" ARMV82A ENABLE_ARMV8.2A)
+    add_arch_flag("-march=armv8.2-a+fp16" ARMV82A ENABLE_ARMV8.2A)
  endif()
  if(ENABLE_ARMV8.1A)
-    add_arch_flag("-march=armv8.1-a" ARMV81A ENABLE_ARMV8.1A)
+    add_arch_flag("-march=armv8.1-a+fp16" ARMV81A ENABLE_ARMV8.1A)
  endif()
  if(ENABLE_ARMV8A)
-    add_arch_flag("-march=armv8-a" ARMV8A ENABLE_ARMV8A)
+    add_arch_flag("-march=armv8-a+fp16" ARMV8A ENABLE_ARMV8A)
  endif()


The manual ENABLE_ARMV8.X options (lines 197–217) unconditionally hardcode +fp16 and pass the combined flag to add_arch_flag. This creates a regression for users who explicitly enable one of these options on a compiler that supports the base architecture but not the +fp16 extension (e.g., GCC ≤7 or older cross-compilation toolchains).

When check_c_compiler_flag rejects the combined flag, add_arch_flag will emit:

FATAL_ERROR "Compiler does not support required flag: '-march=armv8.2-a+fp16' for ENABLE_ARMV8.2A"

The auto-detect path (_detect_armv8_best, lines 77–103) handles this correctly by probing the base arch first, then probing +fp16 separately, and falling back gracefully. The manual path should follow the same pattern:

if(ENABLE_ARMV8.2A) check_c_compiler_flag("-march=armv8.2-a+fp16" _COMP_SUPP_ARMV82A_fp16) if(_COMP_SUPP_ARMV82A_fp16) add_arch_flag("-march=armv8.2-a+fp16" ARMV82A ENABLE_ARMV8.2A) else() add_arch_flag("-march=armv8.2-a" ARMV82A ENABLE_ARMV8.2A) endif() endif()

(Repeat for each ENABLE_ARMV8.XA option.)

updated @greptile-apps

Perfect! The changes look great. You've implemented exactly what I suggested:

Created the add_arch_flag_with_fp16 macro (lines 90-99) that probes for +fp16 support and falls back to the base architecture

Updated all manual ARM options (lines 206-222) to use this new macro instead of hardcoding +fp16

Added helpful status messages indicating whether FP16 is enabled or not

This prevents the FATAL_ERROR for users with older compilers (like GCC ≤7) that support ARMv8.X but not +fp16, while still enabling FP16 when available. The solution is clean, DRY, and matches the pattern already used in the auto-detect path.

✅ Issue resolved!

Address review feedback: the manual ENABLE_ARMV8.X options were hardcoding +fp16 without a fallback, which would cause FATAL_ERROR on compilers that support the base arch but not +fp16 (older GCC, cross-compilation toolchains). Adds add_arch_flag_with_fp16 macro that probes for +fp16 support first, then falls back to the base -march flag — matching the same pattern used in _detect_armv8_best() for auto-detection.

thedonmon · 2026-03-08T20:20:01Z

Greptile Summary

This PR fixes a Linux ARM64 GCC build failure by adding the +fp16 architecture extension to -march flags so that FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, etc.) used in src/ailego/math/*_fp16.cc compile correctly. Apple Clang enables FP16 by default on ARM64; GCC requires the explicit +fp16 suffix.

The auto-detect path (_detect_armv8_best) correctly probes +fp16 support and gracefully falls back to the base flag when unsupported—this part of the fix is well-designed. However, the manual path (ENABLE_ARMV8.X options) unconditionally hardcodes +fp16 in all ARM march flags without a fallback probe. Because add_arch_flag emits FATAL_ERROR when an explicitly-enabled option's flag is unsupported, any user who sets e.g. -DENABLE_ARMV8.2A=ON on a compiler that supports armv8.2-a but not armv8.2-a+fp16 (older GCC versions) will encounter a fatal build error instead of a successful build with the base flag. The auto-detect path's probe-then-fallback pattern should be applied to the manual path as well.

Confidence Score: 2/5

The auto-detect path is safe, but the manual ARM path creates a regression: users with older GCC or cross-compilation toolchains will hit FATAL_ERROR instead of a graceful fallback.

The auto-detect path's probe-then-fallback logic is correct and well-tested per CI. However, the manual ENABLE_ARMV8.X options unconditionally hardcode +fp16 without probing first or providing a fallback. This breaks a contract with existing users: previously, setting -DENABLE_ARMV8.2A=ON on a compiler supporting the base arch would succeed; now it fails with FATAL_ERROR if the compiler doesn't support +fp16. This is a functional regression for older GCC versions and embedded toolchains, which are common in ARM development.

cmake/option.cmake (lines 197–217): Apply the probe-then-fallback pattern from _detect_armv8_best to all ENABLE_ARMV8.X options.

Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CMake Configure Start] --> B{AUTO_DETECT_ARCH?}

    B -- Yes --> C{CMAKE_SYSTEM_PROCESSOR\naarch64/arm64?}
    C -- No --> D[_detect_x86_best]
    C -- Yes --> E[_detect_armv8_best]
    E --> F["Loop: armv8.6-a → armv8-a"]
    F --> G{check_c_compiler_flag\n-march=ver}
    G -- Not Supported --> F
    G -- Supported --> H{check_c_compiler_flag\n-march=ver+fp16}
    H -- Supported --> I["_march_flag = -march=ver+fp16\nSTATUS: FP16 NEON enabled"]
    H -- Not Supported --> J["_march_flag = -march=ver\nSTATUS: FP16 not supported"]
    I --> K[Append to CMAKE_C_FLAGS\nand CMAKE_CXX_FLAGS]
    J --> K
    K --> Z[Done]

    B -- No --> L{ENABLE_ARMV8.xA set?}
    L -- No --> Z
    L -- Yes --> M["add_arch_flag(-march=armv8.x-a+fp16)"]
    M --> N{check_c_compiler_flag\n-march=armv8.x-a+fp16}
    N -- Supported --> O[Append to FLAGS]
    N -- Not Supported\nand option is ON --> P["FATAL_ERROR ⚠️\nNo fallback to base flag"]
    O --> Z
Loading
Last reviewed commit: 2b5eb87

Should re-review, comments addressed.

richyreachy · 2026-03-11T06:02:41Z

thanks @thedonmon . it looks do-able. as we don't have arm+fp16 handy, the verification may take some time.

thedonmon added 3 commits March 8, 2026 04:52

ci: add workflow to test ARM64 FP16 fix on all platforms

68558fe

Tests the cmake +fp16 fix on: - linux-arm64 (ubuntu-24.04-arm) — the platform that was broken - linux-x64 (ubuntu-24.04) — regression check - macos-arm64 (macos-15) — regression check

chore: remove test workflow (verification complete)

2b5eb87

greptile-apps bot reviewed Mar 8, 2026

View reviewed changes

feihongxu0824 requested a review from richyreachy March 9, 2026 01:22

feihongxu0824 assigned richyreachy Mar 9, 2026

richyreachy approved these changes Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC#205

fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC#205
thedonmon wants to merge 4 commits intoalibaba:mainfrom
thedonmon:fix/arm64-fp16-neon

thedonmon commented Mar 8, 2026

Uh oh!

CLAassistant commented Mar 8, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 8, 2026

Uh oh!

greptile-apps bot Mar 8, 2026

Uh oh!

thedonmon Mar 8, 2026

Uh oh!

greptile-apps bot Mar 8, 2026

Uh oh!

thedonmon commented Mar 8, 2026

Greptile Summary

Confidence Score: 2/5

Flowchart

Uh oh!

richyreachy commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

thedonmon commented Mar 8, 2026

Summary

Changes

Testing

Note on PR #193

Uh oh!

CLAassistant commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 8, 2026

Greptile Summary

Confidence Score: 2/5

Flowchart

Uh oh!

greptile-apps bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

thedonmon Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

thedonmon commented Mar 8, 2026

Greptile Summary

Confidence Score: 2/5

Flowchart

Uh oh!

richyreachy commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Mar 8, 2026 •

edited

Loading