Skip to content

grantdh/YAMBOrghini

Repository files navigation

YAMBOrghini

License: MIT Platform Yambo

GWiz: GPU-Accelerated Yambo for Apple Silicon

YAMBOrghini brings GPU acceleration to Yambo many-body perturbation theory calculations on Apple Silicon using the apple-bottom BLAS library and Metal compute shaders.


Overview

Yambo is a many-body perturbation theory code for calculating electronic and optical properties from first principles. YAMBOrghini extends Yambo with GPU support on Apple Silicon by replacing critical BLAS operations with Metal-optimized kernels.

Key Features

  • Minimal Integration: Replace 6 BLAS calls, relink with apple-bottom
  • Automatic Routing: Operations below 100M FLOPs use CPU, above use GPU
  • Numerical Accuracy: ~10⁻¹⁵ relative error, validated against OpenBLAS
  • Zero Dependencies: No module modifications, uses Fortran EXTERNAL declaration

Status

Proof of Concept (v0.1.0) - Functional with ongoing performance characterization

Component Status
Build System ✅ Tested on macOS 14, M2 Max
Numerical Correctness ✅ Validated against CPU builds
Small Systems (< 2K basis) ⚠️ GPU overhead dominates (expected)
Large Systems (> 4K basis) 🔄 Testing in progress
BSE/TDDFT Iterative Solvers 🔄 Not yet tested

Performance

Benchmark Results

si2-gw (Small System)

System: 2-atom Si, ~1000 plane waves, 4 k-points, 50 bands

Configuration Ranks Threads Time vs Baseline
CPU (OpenBLAS) 4 4 19s 1.00×
GPU (Metal) 4 12 75s 0.25×

Analysis: GPU overhead dominates for small matrices. This is expected behavior as most operations route to CPU (< 100M FLOPs threshold).

si64-gw (Large System)

System: 64-atom Si supercell, ~18K plane waves, 150 bands

Status: Benchmarking in progress

Expected: 1.2-1.5× speedup based on apple-bottom validation with Quantum ESPRESSO showing 1.22× speedup on equivalent system size.

Performance Characteristics

Based on apple-bottom library validation:

  • Small systems (N < 2048): CPU recommended (GPU overhead > compute savings)
  • Medium systems (2K-8K basis): 1.1-1.3× expected
  • Large systems (> 8K basis): 1.2-2.0× expected
  • Iterative solvers (BSE/TDDFT): 1.5-3.0× expected (overhead amortized)

Installation

Prerequisites

  • macOS 14+ (Sonoma)
  • Apple Silicon (M1/M2/M3/M4)
  • Xcode Command Line Tools
  • Homebrew packages: gcc, openblas, fftw, hdf5, netcdf-fortran, libxc, open-mpi, scalapack

Quick Start

# 1. Install dependencies
brew install gcc openblas fftw hdf5 netcdf netcdf-fortran libxc open-mpi scalapack

# 2. Install apple-bottom
git clone https://github.com/grantdh/apple-bottom.git
cd apple-bottom && make && make test

# 3. Clone YAMBOrghini
git clone https://github.com/grantdh/YAMBOrghini.git
cd YAMBOrghini

# 4. Get Yambo source (example for 5.3.0)
wget http://www.yambo-code.eu/files/yambo-5.3.0.tar.gz
tar xzf yambo-5.3.0.tar.gz && cd yambo-5.3.0

# 5. Apply patches
patch -p1 < ../patches/mod_wrapper.patch
patch -p1 < ../patches/mod_wrapper_omp.patch

# 6. Configure
./configure FC=mpifort CC=mpicc \
  --enable-open-mp \
  --with-blas-libs="-L/opt/homebrew/opt/openblas/lib -lopenblas \
                    -L$HOME/apple-bottom/build -lapplebottom \
                    -framework Metal -framework Foundation \
                    -framework CoreGraphics -framework Accelerate -lc++" \
  --enable-hdf5-io --enable-par-linalg

# 7. Build
make -j8

# 8. Verify
./bin/yambo -h

Detailed instructions: See docs/INTEGRATION_GUIDE.md


Architecture

Integration Overview

Yambo (Fortran)
    ↓ mod_wrapper.F: ZGEMM → ab_zgemm
    ↓
apple-bottom (C/Objective-C++)
    ↓ Threshold check (100M FLOPs)
    ├─ < 100M FLOPs → OpenBLAS (CPU)
    └─ ≥ 100M FLOPs → Metal kernels (GPU)
        ↓ Double-float emulation (FP32×2)
        ↓ ~10⁻¹⁵ precision

Modified Files

  • src/modules/mod_wrapper.F: 3 ZGEMM → ab_zgemm replacements
  • src/modules/mod_wrapper_omp.F: 3 ZGEMM → ab_zgemm replacements
  • config/setup: Link flags for apple-bottom and Metal frameworks

Total changes: 6 function calls


Documentation


Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Areas of interest:

  • Benchmarking on different Apple Silicon hardware
  • Testing with BSE/TDDFT calculations
  • Optimization of routing thresholds
  • Extension to other DFT/MBPT codes

Citation

If you use YAMBOrghini in your research, please cite:

@software{yamborghini2026,
  author = {Heileman, Grant},
  title = {YAMBOrghini: GPU-Accelerated Yambo for Apple Silicon},
  year = {2026},
  url = {https://github.com/grantdh/YAMBOrghini}
}

Please also cite the underlying projects:

Yambo:

@article{yambo2019,
  title = {Many-body perturbation theory calculations using the yambo code},
  author = {Sangalli, D. and Ferretti, A. and Miranda, H. and others},
  journal = {J. Phys.: Condens. Matter},
  volume = {31},
  pages = {325902},
  year = {2019}
}

apple-bottom:

@software{applebottom2026,
  author = {Heileman, Grant},
  title = {apple-bottom: Metal-accelerated BLAS for Apple Silicon},
  year = {2026},
  url = {https://github.com/grantdh/apple-bottom}
}

License

YAMBOrghini is released under the MIT License. See LICENSE for details.

Note: Yambo is licensed under GPL v2. When used with Yambo, derivative works must comply with GPL requirements.


Acknowledgments

  • Yambo Team for the many-body perturbation theory code
  • Quantum ESPRESSO Team for validation benchmarks
  • Apple for Metal framework and developer tools
  • University of New Mexico for computational resources

Related Projects


Contact

Grant Heileman University of New Mexico, Department of Electrical and Computer Engineering

For bug reports and feature requests, please use the issue tracker.