Skip to content
@Dao-AILab

Dao AI Lab

We are an AI research group led by Prof. Tri Dao

Popular repositories Loading

  1. flash-attention flash-attention Public

    Fast and memory-efficient exact attention

    Python 23.3k 2.6k

  2. quack quack Public

    A Quirky Assortment of CuTe Kernels

    Python 924 109

  3. causal-conv1d causal-conv1d Public

    Causal depthwise conv1d in CUDA, with a PyTorch interface

    Cuda 827 171

  4. sonic-moe sonic-moe Public

    Accelerating MoE with IO and Tile-aware Optimizations

    Python 630 68

  5. fast-hadamard-transform fast-hadamard-transform Public

    Fast Hadamard transform in CUDA, with a PyTorch interface

    C 304 58

  6. grouped-latent-attention grouped-latent-attention Public

    Python 139 4

Repositories

Showing 10 of 11 repositories
  • flash-attention Public

    Fast and memory-efficient exact attention

    Dao-AILab/flash-attention’s past year of commit activity
    Python 23,328 BSD-3-Clause 2,612 985 147 Updated Apr 13, 2026
  • quack Public

    A Quirky Assortment of CuTe Kernels

    Dao-AILab/quack’s past year of commit activity
    Python 924 Apache-2.0 109 16 5 Updated Apr 13, 2026
  • gram-newton-schulz Public

    Fast Polar Decomposition for Muon

    Dao-AILab/gram-newton-schulz’s past year of commit activity
    Python 136 12 2 1 Updated Apr 13, 2026
  • Dao-AILab/dao-ailab.github.io’s past year of commit activity
    HTML 0 MIT 1 0 0 Updated Apr 10, 2026
  • sonic-moe Public

    Accelerating MoE with IO and Tile-aware Optimizations

    Dao-AILab/sonic-moe’s past year of commit activity
    Python 630 Apache-2.0 68 13 3 Updated Apr 1, 2026
  • AI-workflow Public
    Dao-AILab/AI-workflow’s past year of commit activity
    70 2 1 0 Updated Mar 24, 2026
  • fast-hadamard-transform Public

    Fast Hadamard transform in CUDA, with a PyTorch interface

    Dao-AILab/fast-hadamard-transform’s past year of commit activity
    C 304 BSD-3-Clause 58 7 2 Updated Mar 10, 2026
  • causal-conv1d Public

    Causal depthwise conv1d in CUDA, with a PyTorch interface

    Dao-AILab/causal-conv1d’s past year of commit activity
    Cuda 827 BSD-3-Clause 171 39 13 Updated Mar 10, 2026
  • cutlass Public Forked from NVIDIA/cutlass

    CUDA Templates for Linear Algebra Subroutines

    Dao-AILab/cutlass’s past year of commit activity
    C++ 2 1,797 0 0 Updated Jun 9, 2025
  • Dao-AILab/grouped-latent-attention’s past year of commit activity
    Python 139 MIT 4 5 0 Updated May 29, 2025

Top languages

Loading…

Most used topics

Loading…