PMPH 2025 Project: FlashAttention

Authors: Daniel Sommer and Cédric Laubacher

This project implements standard Attention and FlashAttention in CUDA, with Python bindings for benchmarking and comparison.

To run our benchmarking scripts you first have to set up a Python environment, then you can run the different python files within the python folder.

Setting up the virtual environment

Load modules. Run the following in the root of the project:

module load cuda/12.8 python/3.12.8 gcc/13.2.0 ninja/1.8.2

Setting up virtual Python environment. Run the following in the root of the project to create a Python virtual environment and load all requirements from requirements.txt:

./setup_env.sh

This might take a while. When done you should see a venv directory in the root of the project.

Activate the created environment. Activate the newly created environment by running:

source venv/bin/activate

Running the Python scripts:

You have to be in the root of the project folder to run the python scripts. Run the following to run a basic benchmark example with verification:

python3 python/run_benchmark.py --verify

or

python3 python/run_benchmark.py --seq_lengths "64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072,262144,524288,1048576" --head_dim 128 --num_runs 4 --verify --output out_128.pdf --sweep

to reproduce the benchmarking results from our report (it takes a while to run).

Python Script Options

The following are descriptions of the options you can pass to our Python scripts.

Run_benchmark.py

usage: run_benchmark.py [-h] [--seq_lengths SEQ_LENGTHS] [--head_dim HEAD_DIM] [--num_runs NUM_RUNS] [--verify] [--output OUTPUT] [--sweep] [--hyperparam-search]

Benchmark attention implementations

options:
  -h, --help                 show this help message and exit
  --seq_lengths SEQ_LENGTHS  Comma-separated list of sequence lengths (default: 1024)
  --head_dim HEAD_DIM        Head dimension (default: 64)
  --num_runs NUM_RUNS        Number of benchmark runs (default: 10)
  --verify                   Verify correctness between implementations
  --output OUTPUT            Output path for benchmark plot
  --sweep                    Run sequence length sweep instead of single benchmark
  --hyperparam-search        Run hyperparameter search to find optimal block sizes and  thread dimensions

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
python		python
src		src
.gitignore		.gitignore
README.md		README.md
out_128_sweep.csv		out_128_sweep.csv
out_64_sweep.csv		out_64_sweep.csv
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PMPH 2025 Project: FlashAttention

Authors: Daniel Sommer and Cédric Laubacher

Setting up the virtual environment

Running the Python scripts:

Python Script Options

Run_benchmark.py

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PMPH 2025 Project: FlashAttention

Authors: Daniel Sommer and Cédric Laubacher

Setting up the virtual environment

Running the Python scripts:

Python Script Options

Run_benchmark.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages