Skip to content

Add Apple Metal GPU backend (OpenCL → Metal migration)#18

Merged
marcdegraef merged 14 commits into
developfrom
feature/metal-backend
May 29, 2026
Merged

Add Apple Metal GPU backend (OpenCL → Metal migration)#18
marcdegraef merged 14 commits into
developfrom
feature/metal-backend

Conversation

@marcdegraef
Copy link
Copy Markdown
Contributor

Apple deprecated OpenCL; this PR adds a native Apple Metal GPU backend so the
GPU-accelerated programs keep working — and run ~2x faster — on Apple Silicon,
while OpenCL stays the backend on Windows/Linux/NVIDIA/AMD.

What changed

  • Phase 0: routed all GPU consumers (mod_MCOpenCL, mod_DI, mod_EBSDFull,
    mod_SEMCLwrappers) through a backend-neutral GPU_T / mod_GPUsupport wrapper
    surface (pure refactor; verified bit-identical to prior OpenCL output).
  • Phase 1: Metal backend — metal-cpp C-ABI shim (vendored headers under
    ExternalProjects/metal-cpp), mod_GPUsupport_metal.f90 drop-in, EMMC.metal;
    selected via CMake source-swap (no program-module changes).
  • Phase 2: DictIndx InnerProd kernel -> Metal (DI dot products bit-identical).
  • Phase 3: EMMCfoil/EMMCxyz MC variant kernels -> Metal. (MBmoduleOpenCL.cl is
    dead code and intentionally not ported.)
  • Phase 4: build-time .metallib + install rules; Metal default-ON on Apple with
    automatic fallback to OpenCL if the Metal toolchain is absent; README + design
    doc (MetalMigrationPlan.md); renamed the abstraction OpenCL_T/mod_CLsupport ->
    GPU_T/mod_GPUsupport (library name EMOpenCLLib kept).

Validation (h5diff vs OpenCL): MC (default/foil/Ivol), DI, EBSDFull,
SEMCLwrappers — all confirmed (linear paths bit-identical; chaotic MC differs only
by the expected ±1-electron float-divergence between GPU compilers).

Build notes

  • Requires full Xcode (Metal compiler) on macOS; falls back to OpenCL otherwise.
  • Use separate build directories for Metal vs OpenCL (see MetalMigrationPlan.md).

marcdegraef and others added 14 commits May 29, 2026 10:21
Completes the Phase 0 milestone begun in 7b13b0f (GPU-op wrappers on
OpenCL_T + mod_MCOpenCL routed through them):

- Vendor Apple metal-cpp (macOS 15 / iOS 18, Apache-2.0) in-tree under
  ExternalProjects/metal-cpp/ so the future Metal backend build is
  self-contained.
- MetalMigrationPlan.md: full OpenCL->Metal design doc plus a progress
  log recording that EMMCOpenCL output is bit-identical (h5diff -v on
  /EMData reports 0 differences in accum_e/accum_z/accumSP and metadata)
  between the develop and Phase 0 binaries — the wrapper seam is
  behavior-preserving.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mod_DI.f90 (committed in dc6e3d4) is routed through the OpenCL_T GPU-op wrappers: InnerProdGPU, the three GPU drivers build/buffer blocks, the cl_expt/cl_dict host writes, and all releases. Verified via EMDI h5diff (develop vs Phase 0, same fixed Euler-angle dictionary): TopDotProductList, TopMatchIndices, EulerAngles, CI, Phi/Phi1/Phi2, KAM, OSM all show 0 differences -- the InnerProd GPU output and orientation results are bit-identical. Residual differences (DictionaryEulerAngles, ISM, ISMap, IndexingSuccessRate) are upstream of the dot products and trace to known FZ-sampling non-determinism, not this refactor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Completes the Phase 0 active scope. mod_EBSDFull's embedded Monte Carlo sim and the C-callable mod_SEMCLwrappers (EMsoftCgetMCOpenCL) now call the OpenCL_T GPU-op wrappers instead of raw clfortran. Both use the same MC kernel and 14 arguments as the already-verified mod_MCOpenCL full mode.

mod_CLsupport: added an optional 'quiet' flag to all GPU-op wrappers (via a checkq_ helper) that suppresses the fatal error_check (and build_program's build-log print). Default behavior is unchanged, so the already-verified MC/DI/EBSDFull non-quiet call paths are byte-for-byte identical. mod_SEMCLwrappers calls the wrappers with quiet=.TRUE. to preserve its original 'ignore CL errors, defer to caller' semantics, since it is invoked from external host programs where a hard stop would be incorrect.

mod_HROSM confirmed to have no GPU code. mod_DIPCA (uncompiled, absent from all CMakeLists) still has raw clfortran calls and is left as a documented TODO.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a working Metal compute backend alongside OpenCL, validated on the Monte Carlo path. Selected at build time via -DEMsoftOO_ENABLE_Metal_SUPPORT=ON (Apple-only); CMake source-swaps the GPU backend so the program modules are unchanged.

New: metal/EMMC.metal (MSL port of EMMC.cl, behaviour-preserving); metal/emtl_shim.{h,cpp} (C-ABI shim over vendored metal-cpp: unified-memory buffers, live-buffer registry to pick setBuffer vs setBytes, per-pipeline cached args re-applied per enqueue to mirror OpenCL set-once/dispatch-many); mod_CLsupport_metal.f90 (drop-in module mod_CLsupport / type OpenCL_T over the shim, same public surface incl. quiet); clfortran_metal_stub.f90 (CL_MEM_* constants so program modules' use clfortran resolves without the real lib). CMake builds *.metal -> *.metallib at build time and places them beside the .cl files.

Validation: EMMCOpenCL on Metal vs OpenCL reference (h5diff /EMData): accumSP bit-identical; accum_e/accum_z differ only by +/-1 electron per bin (max 2), the inherent chaotic-MC floating-point divergence between the Metal and OpenCL compilers. Physics equivalent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Translates the InnerProd tiled GEMM (BLOCK_SIZE=16) from DictIndx.cl to metal/DictIndx.metal and adds DictIndx to the build-time metallib list. mod_DI was already wrapper-routed (Phase 0), so the whole DI path runs on the Metal backend with no further code changes.

Validated: EMDI on Metal vs OpenCL reference (h5diff /Scan 1) -- TopDotProductList, TopMatchIndices, EulerAngles, CI, Phi/Phi1/Phi2, KAM, OSM all 0 differences (the Metal InnerProd dot products are bit-identical; no FMA divergence). Residual differences (DictionaryEulerAngles, ISM, ISMap, IndexingSuccessRate) are the pre-existing FZ-sampling non-determinism seen in the Phase 0 OpenCL-vs-OpenCL run, not the Metal backend.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dead

Translates the remaining live OpenCL MC kernels to MSL: metal/EMMCfoil.metal (foil-geometry MC, transmitted electrons accumulated in the southern hemisphere) and metal/EMMCxyz.metal (MCxyz, interaction-volume xyz output). Both reuse EMMC.metal's validated MC structure and mod_MCOpenCL already routes them through the wrappers, so no host changes. Added to the build-time metallib list.

Scoping correction: MBmoduleOpenCL.cl (ScatMat/CalcLgh/CalcLghMaster) is dead code -- no compiled module loads it; master patterns use the CPU ZGEEV CalcLgh in mod_gvectors. It is intentionally not built. With foil/Ivol done, every live OpenCL kernel (EMMC/EMMCfoil/EMMCxyz/InnerProd) is now ported to Metal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
opencl/SourceList.cmake installs the build-time *.metallib (from Bin/opencl/) to <install>/opencl/ when Metal is enabled, via FILES_MATCHING so it tracks whatever kernels were built. Records Phase 4 status in the plan: default-Metal-on deferred until remaining verifications pass (and gated on xcrun metal availability); the GPU_T/EMGPULib rename left as an open decision (cosmetic, high-churn).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
EMsoftOO_ENABLE_Metal_SUPPORT now defaults ON on APPLE (OFF/forced elsewhere), with a non-fatal configure warning if the Metal toolchain (xcrun -sdk macosx metal) is missing so CLT-only builds get a clear message. The GPU_T/EMGPULib rename is deferred to a separate branch (cosmetic, high-churn). Migration is functionally complete: all live OpenCL kernels ported, Metal is the default GPU backend on Apple.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On Apple, EMsoftOO_ENABLE_Metal_SUPPORT still defaults ON, but if 'xcrun -sdk macosx -f metal' is not found at configure time (Command-Line-Tools-only install, or Xcode 16+ without the Metal Toolchain component), the build now forces Metal OFF and falls back to the OpenCL backend (re-enabling OpenCL if needed) with an explanatory warning, instead of failing at the .metallib step. A clean default build therefore works out of the box on any Mac.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a section describing the native Metal backend that replaces OpenCL on macOS: what is ported (MC + DI kernels), that it is default-on on Apple with automatic OpenCL fallback, the vendored metal-cpp dependency, and the full-Xcode Metal-compiler build requirement.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Backend-neutral rename of the GPU abstraction now that both OpenCL and Metal back it: type OpenCL_T -> GPU_T and module mod_CLsupport -> mod_GPUsupport (files mod_CLsupport*.f90 -> mod_GPUsupport*.f90), across both backend implementations and all consumers (mod_MCOpenCL, mod_DI, mod_EBSDFull, mod_SEMCLwrappers, the dead mod_DIPCA, Utilities/EMOpenCLinfo, the clfortran stub, CMake source paths). A word-boundary-safe substitution preserved MCOpenCL_T (the MC program class).

Per decision, the library/folder name EMOpenCLLib is kept (renaming to EMGPULib would churn every modality CMakeLists + export targets for no functional gain). Local handle variables stay named CL. This completes the OpenCL->Metal migration; all phases done and runtime-verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…adowing)

The Metal-only clfortran stub produces a clfortran.mod that, if a build dir is reconfigured Metal->OpenCL, shadows the real clfortran and breaks the OpenCL backend. Documents the fix (separate build dirs; non-Apple OpenCL builds unaffected) and the alternative of dropping the stub.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@marcdegraef marcdegraef merged commit c85a3aa into develop May 29, 2026
1 of 2 checks passed
@marcdegraef marcdegraef deleted the feature/metal-backend branch May 29, 2026 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant