This is the official implementation for the paper, "BOSS: Benchmark for Observation Space Shift in Long-Horizon Task".
BOSS is adapted from Libero and OpenVLA. Please create the conda environment and install packages by following the two repos.
conda create --name boss python=3.10
# For more details, please following the installation instructions in Libero & OpenVLA.Download assets folder (link), and put it under the root folder ./.
Download Libero-100, and put the LIBERO-90 into the ./datasets folder.
Run the following command to rename it and only keep single skills to form the boss_44 dataset.
python scripts/form_boss_44_dataset.pyThe BOSS benchmark contains 2 parts of codebases:
Skills Training: Train single skills by using the 4 baseline algorithms: BC-RENET-RNN, BC-RESNET-T, BC-VIT-T, OpenVLA.Challenges: 3 challenges, including BOSS-CH1, BOSS-CH2, BOSS-CH3
For BC-*, execute the following commands.
# For BC-RESNET-RNN
python libero/lifelong/train_skills.py policy="bc_rnn_policy"
# For BC-RESNET-T
python libero/lifelong/train_skills.py policy="bc_transformer_policy"
# For BC-VIT-T
python libero/lifelong/train_skills.py policy="bc_vilt_policy"For OpenVLA, execute the following commands.
cd openvla
zsh shells/run_openvla.shThe checkpoint will be saved at runs/libero44/1.0.0/openvla-7b+libero44+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug.
For BC-*, execute the following commands.
# Test BC-RESNET-RNN when unaffected by OSS
python libero/lifelong/eval_skills_unaffected_by_oss.py \
--benchmark "boss_44" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-RESNET-RNN when affected by OSS
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch1" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-RESNET-T when unaffected by OSS
python libero/lifelong/eval_skills_unaffected_by_oss.py \
--benchmark "boss_44" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-RESNET-T when affected by OSS
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch1" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-VIT-T when unaffected by OSS
python libero/lifelong/eval_skills_unaffected_by_oss.py \
--benchmark "boss_44" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-VIT-T when affected by OSS
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch1" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000Checkpoints will be saved at ./experiments/boss_44/0.0.0/BC{algo}Policy_seed10000/run_00*/.
For OpenVLA, execute the following commands.
# Test OpenVLA when unaffected by OSS
python experiments/robot/libero/eval_openvla_ch1_ch2.py --seed 10000 --task_suite_name "boss_44"
# Test OpenVLA when affected by OSS
python experiments/robot/libero/eval_openvla_ch1_ch2.py --seed 10000 --task_suite_name "ch1"Checkpoints will be saved at ./experiments/logs/.
For BC-*, execute the following commands.
# Test BC-RESNET-RNN when affected by OSS - 2 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-RESNET-RNN when affected by OSS - 3 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-RESNET-T when affected by OSS - 2 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-RESNET-T when affected by OSS - 3 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_3_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-VIT-T when affected by OSS - 2 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000
# Test BC-VIT-T when affected by OSS - 3 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_3_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000For OpenVLA, execute the following commands.
# Test OpenVLA when affected by OSS - 2 modifications
python experiments/robot/libero/eval_openvla_ch1_ch2.py --seed 10000 --task_suite_name "ch2_2_modifications"
# Test OpenVLA when affected by OSS - 3 modifications
python experiments/robot/libero/eval_openvla_ch1_ch2.py --seed 10000 --task_suite_name "ch2_3_modifications"Checkpoints will be saved at ./experiments/logs/.
For BC-*, execute the following commands.
python libero/lifelong/eval_skill_chain.py \
--seed 10000 --device_id 0 \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/"
python libero/lifelong/eval_skill_chain.py \
--seed 10000 --device_id 0 \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/"
python libero/lifelong/eval_skill_chain.py \
--seed 10000 --device_id 0 \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/"For OpenVLA, execute the following commands.
cd openvla
python experiments/robot/libero/eval_openvla_ch3.py --seed 10000Checkpoints will be saved at ./experiments/logs/ch3/.
# Scale up the number of modified bddl files
python RAMG/DA_bddl_files_scale_up_single_modification.py
# If you hope to include multiple modifications in a single bddl file, run the following command
python RAMG/DA_bddl_files_scale_up_multiple_modifications.pyAfter scaling up the number of modified bddl files, we can generate the augmented dataset based on boss_44 dataset and the large set of bddl files.
# Given boss_44 dataset and set of modified bddl files, generate the augmented dataset
python scripts/DA_demos_generation.py For OpenVLA, need to use openvla/experiments/robot/libero/regenerate_libero_dataset.py to regenerate no-op dataset, and rlds_dataset_builder to convert to RLDS dataset. For more details, please check README at OpenVLA Repo.
To access our generated dataset, please contact yygx@cs.unc.edu. It contains 1727 tasks, each with 1 modification.
If you use BOSS or find it interesting, please cite the following paper :)
@article{yang2025boss,
title={BOSS: Benchmark for Observation Space Shift in Long-Horizon Task},
author={Yang, Yue and Zhao, Linfeng and Ding, Mingyu and Bertasius, Gedas and Szafir, Daniel},
journal={arXiv preprint arXiv:2502.15679},
year={2025}
}