Skip to content

fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC#205

Open
thedonmon wants to merge 4 commits intoalibaba:mainfrom
thedonmon:fix/arm64-fp16-neon
Open

fix(cmake): enable FP16 NEON intrinsics on ARM64 GCC#205
thedonmon wants to merge 4 commits intoalibaba:mainfrom
thedonmon:fix/arm64-fp16-neon

Conversation

@thedonmon
Copy link

Summary

Fixes the Linux ARM64 GCC build failure caused by missing +fp16 in -march flags.

The ailego math kernels use FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, vld1q_f16, etc.) which require the +fp16 architecture extension on GCC. Apple Clang enables FP16 by default on ARM64, but GCC does not — it requires explicit -march=armv8.X-a+fp16.

Without this fix, building on Linux ARM64 with GCC fails with:

error: inlining failed in call to 'always_inline' ... : target specific option mismatch

in src/ailego/math/*_fp16.cc files.

Changes

  • cmake/option.cmake: Updated _detect_armv8_best() to probe for +fp16 compiler support and append it to the detected march flag when available
  • Updated all explicit ENABLE_ARMV8.X options to include +fp16

Testing

Verified on all 3 platforms via CI on this fork:

  • linux-arm64 (ubuntu-24.04-arm): Build + C++ tests + Python tests — PASS (was failing before this fix)
  • linux-x64 (ubuntu-24.04): Build + C++ tests + Python tests — PASS (no regression)
  • macos-arm64 (macos-15): Build + C++ tests + Python tests — PASS (no regression)

CI run: https://github.com/thedonmon/zvec/actions/runs/22819664428

Note on PR #193

PR #193 (refactor/march_based_reorganization) is refactoring the same area with per-ISA file dispatch. That PR has a bug in the NEON path: MATH_MARCH_FLAG_NEON is referenced in src/ailego/CMakeLists.txt but never defined — so the NEON files get no march flag at all. This fix provides the correct approach: detect +fp16 support and append it to the march flag.

The ailego math kernels use FP16 NEON intrinsics (vfmaq_f16,
vsubq_f16, vld1q_f16, etc.) which require the +fp16 architecture
extension on GCC. Apple Clang enables FP16 by default on ARM64,
but GCC does not — it requires explicit -march=armv8.X-a+fp16.

This patch:
- Updates _detect_armv8_best() to probe for +fp16 support and
  append it to the detected march flag when available
- Updates all explicit ENABLE_ARMV8.X options to include +fp16

Without this fix, building on Linux ARM64 with GCC fails with:
  "target specific option mismatch" for FP16 NEON intrinsics
  in src/ailego/math/*_fp16.cc
Tests the cmake +fp16 fix on:
- linux-arm64 (ubuntu-24.04-arm) — the platform that was broken
- linux-x64 (ubuntu-24.04) — regression check
- macos-arm64 (macos-15) — regression check
@CLAassistant
Copy link

CLAassistant commented Mar 8, 2026

CLA assistant check
All committers have signed the CLA.

@greptile-apps
Copy link

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR fixes a Linux ARM64 GCC build failure by adding the +fp16 architecture extension to -march flags so that FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, etc.) used in src/ailego/math/*_fp16.cc compile correctly. Apple Clang enables FP16 by default on ARM64; GCC requires the explicit +fp16 suffix.

The auto-detect path (_detect_armv8_best) correctly probes +fp16 support and gracefully falls back to the base flag when unsupported—this part of the fix is well-designed. However, the manual path (ENABLE_ARMV8.X options) unconditionally hardcodes +fp16 in all ARM march flags without a fallback probe. Because add_arch_flag emits FATAL_ERROR when an explicitly-enabled option's flag is unsupported, any user who sets e.g. -DENABLE_ARMV8.2A=ON on a compiler that supports armv8.2-a but not armv8.2-a+fp16 (older GCC versions) will encounter a fatal build error instead of a successful build with the base flag. The auto-detect path's probe-then-fallback pattern should be applied to the manual path as well.

Confidence Score: 2/5

  • The auto-detect path is safe, but the manual ARM path creates a regression: users with older GCC or cross-compilation toolchains will hit FATAL_ERROR instead of a graceful fallback.
  • The auto-detect path's probe-then-fallback logic is correct and well-tested per CI. However, the manual ENABLE_ARMV8.X options unconditionally hardcode +fp16 without probing first or providing a fallback. This breaks a contract with existing users: previously, setting -DENABLE_ARMV8.2A=ON on a compiler supporting the base arch would succeed; now it fails with FATAL_ERROR if the compiler doesn't support +fp16. This is a functional regression for older GCC versions and embedded toolchains, which are common in ARM development.
  • cmake/option.cmake (lines 197–217): Apply the probe-then-fallback pattern from _detect_armv8_best to all ENABLE_ARMV8.X options.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CMake Configure Start] --> B{AUTO_DETECT_ARCH?}

    B -- Yes --> C{CMAKE_SYSTEM_PROCESSOR\naarch64/arm64?}
    C -- No --> D[_detect_x86_best]
    C -- Yes --> E[_detect_armv8_best]
    E --> F["Loop: armv8.6-a → armv8-a"]
    F --> G{check_c_compiler_flag\n-march=ver}
    G -- Not Supported --> F
    G -- Supported --> H{check_c_compiler_flag\n-march=ver+fp16}
    H -- Supported --> I["_march_flag = -march=ver+fp16\nSTATUS: FP16 NEON enabled"]
    H -- Not Supported --> J["_march_flag = -march=ver\nSTATUS: FP16 not supported"]
    I --> K[Append to CMAKE_C_FLAGS\nand CMAKE_CXX_FLAGS]
    J --> K
    K --> Z[Done]

    B -- No --> L{ENABLE_ARMV8.xA set?}
    L -- No --> Z
    L -- Yes --> M["add_arch_flag(-march=armv8.x-a+fp16)"]
    M --> N{check_c_compiler_flag\n-march=armv8.x-a+fp16}
    N -- Supported --> O[Append to FLAGS]
    N -- Not Supported\nand option is ON --> P["FATAL_ERROR ⚠️\nNo fallback to base flag"]
    O --> Z
Loading

Last reviewed commit: 2b5eb87

Comment on lines 197 to 217
if(ENABLE_ARMV8.6A)
add_arch_flag("-march=armv8.6-a" ARMV86A ENABLE_ARMV8.6A)
add_arch_flag("-march=armv8.6-a+fp16" ARMV86A ENABLE_ARMV8.6A)
endif()
if(ENABLE_ARMV8.5A)
add_arch_flag("-march=armv8.5-a" ARMV85A ENABLE_ARMV8.5A)
add_arch_flag("-march=armv8.5-a+fp16" ARMV85A ENABLE_ARMV8.5A)
endif()
if(ENABLE_ARMV8.4A)
add_arch_flag("-march=armv8.4-a" ARMV84A ENABLE_ARMV8.4A)
add_arch_flag("-march=armv8.4-a+fp16" ARMV84A ENABLE_ARMV8.4A)
endif()
if(ENABLE_ARMV8.3A)
add_arch_flag("-march=armv8.3-a" ARMV83A ENABLE_ARMV8.3A)
add_arch_flag("-march=armv8.3-a+fp16" ARMV83A ENABLE_ARMV8.3A)
endif()
if(ENABLE_ARMV8.2A)
add_arch_flag("-march=armv8.2-a" ARMV82A ENABLE_ARMV8.2A)
add_arch_flag("-march=armv8.2-a+fp16" ARMV82A ENABLE_ARMV8.2A)
endif()
if(ENABLE_ARMV8.1A)
add_arch_flag("-march=armv8.1-a" ARMV81A ENABLE_ARMV8.1A)
add_arch_flag("-march=armv8.1-a+fp16" ARMV81A ENABLE_ARMV8.1A)
endif()
if(ENABLE_ARMV8A)
add_arch_flag("-march=armv8-a" ARMV8A ENABLE_ARMV8A)
add_arch_flag("-march=armv8-a+fp16" ARMV8A ENABLE_ARMV8A)
endif()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual ENABLE_ARMV8.X options (lines 197–217) unconditionally hardcode +fp16 and pass the combined flag to add_arch_flag. This creates a regression for users who explicitly enable one of these options on a compiler that supports the base architecture but not the +fp16 extension (e.g., GCC ≤7 or older cross-compilation toolchains).

When check_c_compiler_flag rejects the combined flag, add_arch_flag will emit:

FATAL_ERROR "Compiler does not support required flag: '-march=armv8.2-a+fp16' for ENABLE_ARMV8.2A"

The auto-detect path (_detect_armv8_best, lines 77–103) handles this correctly by probing the base arch first, then probing +fp16 separately, and falling back gracefully. The manual path should follow the same pattern:

if(ENABLE_ARMV8.2A)
  check_c_compiler_flag("-march=armv8.2-a+fp16" _COMP_SUPP_ARMV82A_fp16)
  if(_COMP_SUPP_ARMV82A_fp16)
    add_arch_flag("-march=armv8.2-a+fp16" ARMV82A ENABLE_ARMV8.2A)
  else()
    add_arch_flag("-march=armv8.2-a" ARMV82A ENABLE_ARMV8.2A)
  endif()
endif()

(Repeat for each ENABLE_ARMV8.XA option.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated @greptile-apps

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! The changes look great. You've implemented exactly what I suggested:

  1. Created the add_arch_flag_with_fp16 macro (lines 90-99) that probes for +fp16 support and falls back to the base architecture
  2. Updated all manual ARM options (lines 206-222) to use this new macro instead of hardcoding +fp16
  3. Added helpful status messages indicating whether FP16 is enabled or not

This prevents the FATAL_ERROR for users with older compilers (like GCC ≤7) that support ARMv8.X but not +fp16, while still enabling FP16 when available. The solution is clean, DRY, and matches the pattern already used in the auto-detect path.

✅ Issue resolved!

Address review feedback: the manual ENABLE_ARMV8.X options were
hardcoding +fp16 without a fallback, which would cause FATAL_ERROR
on compilers that support the base arch but not +fp16 (older GCC,
cross-compilation toolchains).

Adds add_arch_flag_with_fp16 macro that probes for +fp16 support
first, then falls back to the base -march flag — matching the same
pattern used in _detect_armv8_best() for auto-detection.
@thedonmon
Copy link
Author

Greptile Summary

This PR fixes a Linux ARM64 GCC build failure by adding the +fp16 architecture extension to -march flags so that FP16 NEON intrinsics (vfmaq_f16, vsubq_f16, etc.) used in src/ailego/math/*_fp16.cc compile correctly. Apple Clang enables FP16 by default on ARM64; GCC requires the explicit +fp16 suffix.

The auto-detect path (_detect_armv8_best) correctly probes +fp16 support and gracefully falls back to the base flag when unsupported—this part of the fix is well-designed. However, the manual path (ENABLE_ARMV8.X options) unconditionally hardcodes +fp16 in all ARM march flags without a fallback probe. Because add_arch_flag emits FATAL_ERROR when an explicitly-enabled option's flag is unsupported, any user who sets e.g. -DENABLE_ARMV8.2A=ON on a compiler that supports armv8.2-a but not armv8.2-a+fp16 (older GCC versions) will encounter a fatal build error instead of a successful build with the base flag. The auto-detect path's probe-then-fallback pattern should be applied to the manual path as well.

Confidence Score: 2/5

  • The auto-detect path is safe, but the manual ARM path creates a regression: users with older GCC or cross-compilation toolchains will hit FATAL_ERROR instead of a graceful fallback.
  • The auto-detect path's probe-then-fallback logic is correct and well-tested per CI. However, the manual ENABLE_ARMV8.X options unconditionally hardcode +fp16 without probing first or providing a fallback. This breaks a contract with existing users: previously, setting -DENABLE_ARMV8.2A=ON on a compiler supporting the base arch would succeed; now it fails with FATAL_ERROR if the compiler doesn't support +fp16. This is a functional regression for older GCC versions and embedded toolchains, which are common in ARM development.
  • cmake/option.cmake (lines 197–217): Apply the probe-then-fallback pattern from _detect_armv8_best to all ENABLE_ARMV8.X options.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CMake Configure Start] --> B{AUTO_DETECT_ARCH?}

    B -- Yes --> C{CMAKE_SYSTEM_PROCESSOR\naarch64/arm64?}
    C -- No --> D[_detect_x86_best]
    C -- Yes --> E[_detect_armv8_best]
    E --> F["Loop: armv8.6-a → armv8-a"]
    F --> G{check_c_compiler_flag\n-march=ver}
    G -- Not Supported --> F
    G -- Supported --> H{check_c_compiler_flag\n-march=ver+fp16}
    H -- Supported --> I["_march_flag = -march=ver+fp16\nSTATUS: FP16 NEON enabled"]
    H -- Not Supported --> J["_march_flag = -march=ver\nSTATUS: FP16 not supported"]
    I --> K[Append to CMAKE_C_FLAGS\nand CMAKE_CXX_FLAGS]
    J --> K
    K --> Z[Done]

    B -- No --> L{ENABLE_ARMV8.xA set?}
    L -- No --> Z
    L -- Yes --> M["add_arch_flag(-march=armv8.x-a+fp16)"]
    M --> N{check_c_compiler_flag\n-march=armv8.x-a+fp16}
    N -- Supported --> O[Append to FLAGS]
    N -- Not Supported\nand option is ON --> P["FATAL_ERROR ⚠️\nNo fallback to base flag"]
    O --> Z
Loading

Last reviewed commit: 2b5eb87

Should re-review, comments addressed.

@richyreachy
Copy link
Collaborator

thanks @thedonmon . it looks do-able. as we don't have arm+fp16 handy, the verification may take some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants