This repository contains the implementation of Random-Order Autoregressive (AR) Image Generation on CIFAR-10.
The project establishes a baseline using standard raster-scan order generation and serves as a foundation for experimenting with alternative token generation orders (e.g., random permutations) to improve model calibration and robustness.
The pipeline follows a two-stage approach:
-
VQ-VAE Tokenizer: Compresses
$32 \times 32$ images into an$8 \times 8$ discrete latent grid. - Autoregressive Transformer: Models the distribution of discrete tokens to generate new images.
While the baseline uses a fixed raster scan order (row-by-row), this codebase is designed to support research into randomized generation orders.
Pretrained weights for the VQ-VAE tokenizer and the baseline RandAR model (raster order) are available at:
Google Drive: https://drive.google.com/drive/folders/1B528vJu1Icn1PtIwJVfmd39WPNqIEEtg?usp=sharing
This allows reproducing reported results without retraining the models.
To reproduce the baseline experiment, please follow the steps below. The project uses uv for fast and reliable dependency management.
- Python: Version 3.12 or higher.
- GPU: An NVIDIA GPU is recommended for training (tested on NVIDIA RTX GPUs).
- uv: Ensure
uvis installed on your system.
# Install uv (Linux/macOS)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install uv (Windows PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"All dependencies are managed via pyproject.toml.
Step 2.1: Create environment and install dependencies
Run the following command in the project root. This will create a virtual environment (.venv) and install PyTorch, Transformers, and other required libraries.
uv syncThe experiment consists of five sequential steps. Run each script/notebook in the order listed below.
Download and prepare the CIFAR-10 dataset.
data/load_CIFAR10.pyTrain the vector-quantized autoencoder to learn the discrete codebook. Input: Raw images from data/ Output: Trained weights (tokenizer_vq/vqvae_cifar10.pth)
Open and run all cells in the Jupyter Notebook
uv run jupyter notebook tokenizer_vq/vq-vae.ipynbEncode the entire CIFAR-10 dataset into discrete token sequences using the trained VQ-VAE.
tools/extract_latent_codes.pyTrain the decoder-only Transformer on the extracted token sequences.
train_c2i.pyEvaluate the trained AR model by generating samples and computing metrics
eval_c2i.py