DySL-VLA: Efficient Vision-Language-Action Model Inference via Dynamic-Static Layer-Skipping for Robot Manipulation
DySL-VLA is a novel framework that accelerates Vision-Language-Action (VLA) model inference for robot manipulation by dynamically skipping unnecessary layers based on action importance. Developing VLA models on real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. DySL-VLA addresses this based on the varying importance levels of different actions: critical steps demand high precision, while less important ones can tolerate more variance. DySL-VLA uses different layer-skipping strategies to acclerate VLA inference for different action importance levels.
-
Dynamic-Static Layer Skipping: Statically keeps the informative layers and dynamically skips unnecessary layers
-
Prior-Post Skipping Guidance: Smart action importance evaluation and adaptive layer skipping for difference importance levels
-
Skip-Aware Two-Stage Knowledge Distillation: Efficiently transforms standard VLAs into dynamic architectures
We develop DySL-VLA based on RoboFlamingo and OpenVLA-oft. Please follow the instructions in the corresponding file.
If you find this work useful for your research, please cite our paper:
@article{yang2025dysl,
title={DySL-VLA: Efficient Vision-Language-Action Model Inference via Dynamic-Static Layer-Skipping for Robot Manipulation},
author={Yang, Zebin and Qi, Yijiahao and Xie, Tong and Yu, Bo and Liu, Shaoshan and Li, Meng},
journal={arXiv preprint arXiv:2602.22896},
year={2025}
}