Skip to content

snuvclab/dwm

Repository files navigation

DWM icon Dexterous World Models

CVPR 2026

📄 Paper | 🌐 Project Page

Official implementation of Dexterous World Models.

TL;DR: DWM is a scene-action-conditioned video diffusion model for simulating embodied dexterous actions in a given static 3D scene.

DWM teaser

🗞️ News

  • April 3, 2026: Code release. We also release the DWM WAN version together.
  • February 21, 2026: DWM was accepted to CVPR 2026.

Installation

git clone --recursive https://github.com/snuvclab/dwm
cd dwm

# If you already cloned without submodules, run:
# git submodule update --init --recursive

conda create -n dwm python=3.10 -y

pip install -r requirements.txt

All commands below assume you run them from the repository root.

Getting Started

Data Preparation

See the preprocessing guides:

The expected processed sample structure is:

<processed_root>/
└── <sample>/
    ├── videos/
    │   └── <stem>.mp4
    ├── videos_static/
    │   └── <stem>.mp4
    ├── videos_hands/
    │   └── <stem>.mp4
    ├── prompts/
    │   └── <stem>.txt
    ├── prompts_rewrite/
    │   └── <stem>.txt
    ├── video_latents/
    │   └── <stem>.pt
    ├── static_video_latents/
    │   └── <stem>.pt
    ├── hand_video_latents/
    │   └── <stem>.pt
    └── prompt_embeds_rewrite/
        └── <stem>.pt

You may place processed data under any root directory you prefer. Training and inference paths can be configured through the example YAML or CLI overrides.

Training

The main training guide is:

Public example config and launcher:

Example smoke run:

bash training/cogvideox/examples/train_static_hand_concat.sh \
  --debug \
  --override data.data_root=/path/to/processed_root \
  --override logging.report_to=none

Inference

Inference supports either a dataset file or a single sample. Example launcher:

bash training/cogvideox/examples/infer_static_hand_concat.sh \
  --checkpoint_path outputs/<date>/<experiment> \
  --data_root /path/to/processed_root \
  --dataset_file dataset_files/trumans_test.txt \
  --output_dir outputs_infer/dwm_cogvideox_dataset

Example dataset files based on the train and test splits used for the paper models are available under dataset_files/:

Single-sample inference:

python training/cogvideox/inference.py \
  --checkpoint_path outputs/<date>/<experiment> \
  --experiment_config training/cogvideox/configs/examples/dwm_cogvideox_5b_lora.yaml \
  --data_root /path/to/processed_root \
  --video <relative/path/to/videos/00000.mp4> \
  --output_dir outputs_infer/dwm_cogvideox_single

Notes

  • The default 5B training path typically needs an 80 GB-class GPU.
  • Relative dataset paths in training and inference are resolved inside data_root.
  • If you use a custom processed-data root, update data.data_root in the example config or pass it via CLI overrides.

Acknowledgements

We thank the contributors to VideoX-Fun, finetrainers, CogVideo, and Wan for open-sourcing their work.

Citation

If you find this repository useful, please cite:

@inproceedings{kim2026dwm,
  title={Dexterous World Models},
  author={Kim, Byungjun and Kim, Taeksoo and Lee, Junyoung and Joo, Hanbyul},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

About

[CVPR 2026] Dexterous World Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors