Skip to content

YY-GX/BOSS

Repository files navigation

This is the official implementation for the paper, "BOSS: Benchmark for Observation Space Shift in Long-Horizon Task".

Table of Contents

Installation

BOSS is adapted from Libero and OpenVLA. Please create the conda environment and install packages by following the two repos.

conda create --name boss python=3.10
# For more details, please following the installation instructions in Libero & OpenVLA.

Download assets folder (link), and put it under the root folder ./.

Download Libero-100, and put the LIBERO-90 into the ./datasets folder. Run the following command to rename it and only keep single skills to form the boss_44 dataset.

python scripts/form_boss_44_dataset.py

BOSS Benchmark

The BOSS benchmark contains 2 parts of codebases:

  • Skills Training: Train single skills by using the 4 baseline algorithms: BC-RENET-RNN, BC-RESNET-T, BC-VIT-T, OpenVLA.
  • Challenges: 3 challenges, including BOSS-CH1, BOSS-CH2, BOSS-CH3

Skills Training

For BC-*, execute the following commands.

# For BC-RESNET-RNN
python libero/lifelong/train_skills.py policy="bc_rnn_policy"
# For BC-RESNET-T
python libero/lifelong/train_skills.py policy="bc_transformer_policy"
# For BC-VIT-T
python libero/lifelong/train_skills.py policy="bc_vilt_policy"

For OpenVLA, execute the following commands.

cd openvla
zsh shells/run_openvla.sh

The checkpoint will be saved at runs/libero44/1.0.0/openvla-7b+libero44+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug.

Challenges

BOSS-CH1: Single Predicate Modification

For BC-*, execute the following commands.

# Test BC-RESNET-RNN when unaffected by OSS
python libero/lifelong/eval_skills_unaffected_by_oss.py \
--benchmark "boss_44" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-RESNET-RNN when affected by OSS
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch1" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-RESNET-T when unaffected by OSS
python libero/lifelong/eval_skills_unaffected_by_oss.py \
--benchmark "boss_44" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-RESNET-T when affected by OSS
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch1" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-VIT-T when unaffected by OSS
python libero/lifelong/eval_skills_unaffected_by_oss.py \
--benchmark "boss_44" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-VIT-T when affected by OSS
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch1" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000

Checkpoints will be saved at ./experiments/boss_44/0.0.0/BC{algo}Policy_seed10000/run_00*/.

For OpenVLA, execute the following commands.

# Test OpenVLA when unaffected by OSS
python experiments/robot/libero/eval_openvla_ch1_ch2.py  --seed 10000 --task_suite_name "boss_44"

# Test OpenVLA when affected by OSS
python experiments/robot/libero/eval_openvla_ch1_ch2.py  --seed 10000 --task_suite_name "ch1"

Checkpoints will be saved at ./experiments/logs/.

BOSS-CH2: Accumulated Predicates Modification

For BC-*, execute the following commands.

# Test BC-RESNET-RNN when affected by OSS - 2 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-RESNET-RNN when affected by OSS - 3 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-RESNET-T when affected by OSS - 2 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-RESNET-T when affected by OSS - 3 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_3_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-VIT-T when affected by OSS - 2 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_2_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000

# Test BC-VIT-T when affected by OSS - 3 modifications
python libero/lifelong/eval_skills_affected_by_oss.py \
--benchmark "ch2_3_modifications" \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/" \
--seed 10000

For OpenVLA, execute the following commands.

# Test OpenVLA when affected by OSS - 2 modifications
python experiments/robot/libero/eval_openvla_ch1_ch2.py  --seed 10000 --task_suite_name "ch2_2_modifications"

# Test OpenVLA when affected by OSS - 3 modifications
python experiments/robot/libero/eval_openvla_ch1_ch2.py  --seed 10000 --task_suite_name "ch2_3_modifications"

Checkpoints will be saved at ./experiments/logs/.

BOSS-CH3: Real Long-Horizon Task

For BC-*, execute the following commands.

python libero/lifelong/eval_skill_chain.py \
--seed 10000 --device_id 0 \
--model_path_folder "./experiments/boss_44/0.0.0/BCRNNPolicy_seed10000/run_001/"

python libero/lifelong/eval_skill_chain.py \
--seed 10000 --device_id 0 \
--model_path_folder "./experiments/boss_44/0.0.0/BCTransformerPolicy_seed10000/run_001/"

python libero/lifelong/eval_skill_chain.py \
--seed 10000 --device_id 0 \
--model_path_folder "./experiments/boss_44/0.0.0/BCViLTPolicy_seed10000/run_001/"

For OpenVLA, execute the following commands.

cd openvla
python experiments/robot/libero/eval_openvla_ch3.py --seed 10000

Checkpoints will be saved at ./experiments/logs/ch3/.

Data Augmentation

RAMG

# Scale up the number of modified bddl files 
python RAMG/DA_bddl_files_scale_up_single_modification.py

# If you hope to include multiple modifications in a single bddl file, run the following command
python RAMG/DA_bddl_files_scale_up_multiple_modifications.py

Datasets

After scaling up the number of modified bddl files, we can generate the augmented dataset based on boss_44 dataset and the large set of bddl files.

# Given boss_44 dataset and set of modified bddl files, generate the augmented dataset
python scripts/DA_demos_generation.py 

For OpenVLA, need to use openvla/experiments/robot/libero/regenerate_libero_dataset.py to regenerate no-op dataset, and rlds_dataset_builder to convert to RLDS dataset. For more details, please check README at OpenVLA Repo. To access our generated dataset, please contact yygx@cs.unc.edu. It contains 1727 tasks, each with 1 modification.

Citation

If you use BOSS or find it interesting, please cite the following paper :)

@article{yang2025boss,
  title={BOSS: Benchmark for Observation Space Shift in Long-Horizon Task},
  author={Yang, Yue and Zhao, Linfeng and Ding, Mingyu and Bertasius, Gedas and Szafir, Daniel},
  journal={arXiv preprint arXiv:2502.15679},
  year={2025}
}

About

Codebase for BOSS benchmark, designed for Observation Space Shift (OSS) in Skill Chaining

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages