GWiz: GPU-Accelerated Yambo for Apple Silicon
YAMBOrghini brings GPU acceleration to Yambo many-body perturbation theory calculations on Apple Silicon using the apple-bottom BLAS library and Metal compute shaders.
Yambo is a many-body perturbation theory code for calculating electronic and optical properties from first principles. YAMBOrghini extends Yambo with GPU support on Apple Silicon by replacing critical BLAS operations with Metal-optimized kernels.
- Minimal Integration: Replace 6 BLAS calls, relink with apple-bottom
- Automatic Routing: Operations below 100M FLOPs use CPU, above use GPU
- Numerical Accuracy: ~10⁻¹⁵ relative error, validated against OpenBLAS
- Zero Dependencies: No module modifications, uses Fortran EXTERNAL declaration
Proof of Concept (v0.1.0) - Functional with ongoing performance characterization
| Component | Status |
|---|---|
| Build System | ✅ Tested on macOS 14, M2 Max |
| Numerical Correctness | ✅ Validated against CPU builds |
| Small Systems (< 2K basis) | |
| Large Systems (> 4K basis) | 🔄 Testing in progress |
| BSE/TDDFT Iterative Solvers | 🔄 Not yet tested |
System: 2-atom Si, ~1000 plane waves, 4 k-points, 50 bands
| Configuration | Ranks | Threads | Time | vs Baseline |
|---|---|---|---|---|
| CPU (OpenBLAS) | 4 | 4 | 19s | 1.00× |
| GPU (Metal) | 4 | 12 | 75s | 0.25× |
Analysis: GPU overhead dominates for small matrices. This is expected behavior as most operations route to CPU (< 100M FLOPs threshold).
System: 64-atom Si supercell, ~18K plane waves, 150 bands
Status: Benchmarking in progress
Expected: 1.2-1.5× speedup based on apple-bottom validation with Quantum ESPRESSO showing 1.22× speedup on equivalent system size.
Based on apple-bottom library validation:
- Small systems (N < 2048): CPU recommended (GPU overhead > compute savings)
- Medium systems (2K-8K basis): 1.1-1.3× expected
- Large systems (> 8K basis): 1.2-2.0× expected
- Iterative solvers (BSE/TDDFT): 1.5-3.0× expected (overhead amortized)
- macOS 14+ (Sonoma)
- Apple Silicon (M1/M2/M3/M4)
- Xcode Command Line Tools
- Homebrew packages:
gcc,openblas,fftw,hdf5,netcdf-fortran,libxc,open-mpi,scalapack
# 1. Install dependencies
brew install gcc openblas fftw hdf5 netcdf netcdf-fortran libxc open-mpi scalapack
# 2. Install apple-bottom
git clone https://github.com/grantdh/apple-bottom.git
cd apple-bottom && make && make test
# 3. Clone YAMBOrghini
git clone https://github.com/grantdh/YAMBOrghini.git
cd YAMBOrghini
# 4. Get Yambo source (example for 5.3.0)
wget http://www.yambo-code.eu/files/yambo-5.3.0.tar.gz
tar xzf yambo-5.3.0.tar.gz && cd yambo-5.3.0
# 5. Apply patches
patch -p1 < ../patches/mod_wrapper.patch
patch -p1 < ../patches/mod_wrapper_omp.patch
# 6. Configure
./configure FC=mpifort CC=mpicc \
--enable-open-mp \
--with-blas-libs="-L/opt/homebrew/opt/openblas/lib -lopenblas \
-L$HOME/apple-bottom/build -lapplebottom \
-framework Metal -framework Foundation \
-framework CoreGraphics -framework Accelerate -lc++" \
--enable-hdf5-io --enable-par-linalg
# 7. Build
make -j8
# 8. Verify
./bin/yambo -hDetailed instructions: See docs/INTEGRATION_GUIDE.md
Yambo (Fortran)
↓ mod_wrapper.F: ZGEMM → ab_zgemm
↓
apple-bottom (C/Objective-C++)
↓ Threshold check (100M FLOPs)
├─ < 100M FLOPs → OpenBLAS (CPU)
└─ ≥ 100M FLOPs → Metal kernels (GPU)
↓ Double-float emulation (FP32×2)
↓ ~10⁻¹⁵ precision
src/modules/mod_wrapper.F: 3 ZGEMM → ab_zgemm replacementssrc/modules/mod_wrapper_omp.F: 3 ZGEMM → ab_zgemm replacementsconfig/setup: Link flags for apple-bottom and Metal frameworks
Total changes: 6 function calls
- Integration Guide: Complete build instructions
- Performance Profiling: Detailed performance analysis and optimization
- NVIDIA Comparison Plan: Methodology for GPU comparisons
- Benchmark Results: Test cases and timing data
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Areas of interest:
- Benchmarking on different Apple Silicon hardware
- Testing with BSE/TDDFT calculations
- Optimization of routing thresholds
- Extension to other DFT/MBPT codes
If you use YAMBOrghini in your research, please cite:
@software{yamborghini2026,
author = {Heileman, Grant},
title = {YAMBOrghini: GPU-Accelerated Yambo for Apple Silicon},
year = {2026},
url = {https://github.com/grantdh/YAMBOrghini}
}Please also cite the underlying projects:
Yambo:
@article{yambo2019,
title = {Many-body perturbation theory calculations using the yambo code},
author = {Sangalli, D. and Ferretti, A. and Miranda, H. and others},
journal = {J. Phys.: Condens. Matter},
volume = {31},
pages = {325902},
year = {2019}
}apple-bottom:
@software{applebottom2026,
author = {Heileman, Grant},
title = {apple-bottom: Metal-accelerated BLAS for Apple Silicon},
year = {2026},
url = {https://github.com/grantdh/apple-bottom}
}YAMBOrghini is released under the MIT License. See LICENSE for details.
Note: Yambo is licensed under GPL v2. When used with Yambo, derivative works must comply with GPL requirements.
- Yambo Team for the many-body perturbation theory code
- Quantum ESPRESSO Team for validation benchmarks
- Apple for Metal framework and developer tools
- University of New Mexico for computational resources
- Yambo: Many-body perturbation theory code
- apple-bottom: Metal-accelerated BLAS library
- Quantum ESPRESSO: Density functional theory code
Grant Heileman University of New Mexico, Department of Electrical and Computer Engineering
For bug reports and feature requests, please use the issue tracker.