A list of the awesome compilers, programming, performance, and porting-related papers and tutorials for AI accelerators like Sambanova, Cerebras, Graphcore, Groq, etc.
- An End-to-End Programming Model for AI Engine Architectures
- [Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation] (https://dl.acm.org/doi/10.1145/3543622.3573047)
- AN END-TO-END PROGRAMMING MODEL FOR AI ENGINE ARCHITECTURES, THESIS
- AI Engines and Their Applications
- Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs by Hongwu Peng et al., arXiv 2024
- Revet: A Language and Compiler for Dataflow Threads by Alexander C. Rucker et al., arXiv 2024
- Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques by Sven Verdoolaege et.al IMPACT 2020
- Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors by Zhang et. al SC23(INTEL GAUDI)
- Automated Code Generation of High-Order Stencils for a Dataflow Architecture
- Fast Molecular Dynamics on Wafer-Scale System
- Matrix-Free Finite-Volume Kernels on a Dataflow Architecture
- Survey of Machine Learning Accelerators by Albert Reuther et.al, HPEC 2020
- A taxonomy for classification and comparison of dataflows for gnn accelerators by Raveesh Garg et al., OSTI 2021
- Fast Stencil-Code Computation on a Wafer-Scale Processor by Kamil et al., 2020
- TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine by Nick Brown et al., 2022
- Wafer-Scale Fast Fourier Transforms by Marcelo Orenes-Vera et al., arXiv 2022
- Disruptive Changes in Field Equation Modeling A Simple Interface for Wafer Scale Engines by Mino Woo et al., 2022
- Massively scalable stencil algorithm by Mathias et al., arXiv 2022
- Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area by Yoshii et al., SC 23
- Efficient algorithms for Monte Carlo particle transport on AI accelerator hardware by John Tramm et al., arXiv 2023
- Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems by Hatem et al., SC23
- Breaking the Molecular Dynamics Timescale Barrier Using a Wafer-Scale System by Kylee Santos et al., arXiv 2024
- 2D Collective Communication for the Cerebras Wafer-Scale Engine by Louis Schnyder Bachelor's Thesis ETHZ, 2024
- CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2 by Shihui Song et al., HPDC 2024
- Slide FFT on a homogeneous mesh in wafer-scale computing by Maurice H.P.M. van Putten et al., arXiv 2024
- Near-Optimal Wafer-Scale Reduce by Piotr Luczynski et al., HPDC 2024
- Automated Code Generation of High-Order Stencils for a Dataflow Architecture
- Fast Molecular Dynamics on Wafer-Scale System
- Matrix-Free Finite-Volume Kernels on a Dataflow Architecture
- SPADA(tools)
- Matrix Free Finite Volume Kernels on dataflow architectures(kernel)
- Scalable Distributed High-Order Stencil Computations(25 point stencils)(kernel)
- The Spatial Computer: A model for energy-efficient parallel computation(modelling)
- Sparse Matrix Multiplication on Cerebras WSE-2: SpMM in Spatial Computing(kernel)
- Parallel Sparse Tensor-times-Vector on Cerebras WSE-2(MS Thesis)
- An MLIR Lowering Pipeline for Stencils at Wafer-Scale(tools/compiler)
- Near Optimal Wafer Scale Reduce(communication optimization)
- Automated Code Generation for High-Order Stencils for a dataflow Architecture(codegen/automation)
- A COMPARISON OF THE CEREBRAS WAFER-SCALE INTEGRATION TECHNOLOGY WITH NVIDIA GPU-BASED SYSTEMS FOR ARTIFICIAL INTELLIGENCE
- DACE: Data-Centric Parallel Programming; Johannes de Fine Licht (ETH Zurich)
- StencilFlow
- Portable, high-performance Python on CPUs, GPUs, FPGAs (XACC Winter school 2022)
- Stateful Dataflow Multigraphs: Data-Centric Performance Portability on Heterogeneous Architectures
- Bridging Control-Centric and Data-Centric Optimization
- A Data-Centric Optimization Framework for Machine Learning
- Lifting C Semantics for Dataflow Optimization
- Streaming Task Graph Scheduling for Dataflow Architectures
- The spatial computer: A model for energy-efficient parallel computation by Lukas et al., arXiv 2023
- Dataflow for exascale, 2012 video
- Stanford Seminar - Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier, Maxeler(A Groq acquired company now) DataFlow hardware since 2000!
We encourage all contributions to this repository. Open an issue or send a pull request.