Skip to content

Invertible affine transformations for outlier reduction in post-training quantization on transformer-based diffusion models.

License

Notifications You must be signed in to change notification settings

ag2718/fq-diffusion

 
 

Repository files navigation

Affine Transformations (FlatQuant) for Diffusion-Based Transformer Models

In this work, we apply invertible affine transformations before performing post-training quantization to reduce outliers and quantization error for diffusion models. This approach is inspired by FlatQuant, which applies affine transformations for language models. However, we optimize this idea specifically for transformer-based diffusion models.

image

Demo

To run the demo (inference) for this project, use the following command:

python ./main.py \
    --model ./modelzoo/pixart-sigma/PixArt-Sigma-XL-2-1024-MS \
    --w_bits 8 --a_bits 8 \
    --k_bits 8 --k_asym --k_groupsize 128 \
    --v_bits 8 --v_asym --v_groupsize 128 \
    --cali_dataset coco \
    --nsamples 4 --cali_timesteps 10 \
    --cali_bsz 4 --flat_lr 5e-3 \
    --lwc --lac --cali_trans --add_diag \
    --output_dir ./outputs --resume --reload_matrix \
    --prompt "[YOUR PROMPT HERE]"

Parameters:

w_bits, a_bits, k_bits, v_bits: Quantization levels for weights, activations, keys, and values. --prompt: Set your text-prompt (default: "A beautiful world"). Recommended Settings: W8A8 or W6A6 (adjust K, V as needed). The generated image will be saved as ./demo_image.png.

About

Invertible affine transformations for outlier reduction in post-training quantization on transformer-based diffusion models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.8%
  • Cuda 10.7%
  • C++ 2.2%
  • Jupyter Notebook 2.0%
  • Shell 0.2%
  • CMake 0.1%