HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
-
Updated
May 26, 2026 - Jupyter Notebook
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
Visualize episode embeddings and select maximally diverse training subsets for robotics ML. Train on 10K diverse episodes instead of 50K random ones.
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
Cross-embodiment visual representation learning using Vision Transformers conditioned on robot kinematic structure via cross-attention.
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
Imitation Learning for Surgical Robot Task Automation — Behavioral Cloning, DAgger, Diffusion Policy, and VLA models on JIGSAWS surgical demonstrations
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
Add a description, image, and links to the vla-model topic page so that developers can more easily learn about it.
To associate your repository with the vla-model topic, visit your repo's landing page and select "manage topics."