Skip to content

ZzZZCHS/RoboGround

Repository files navigation

RoboGround

Code & data for "RoboGround: Robotic Manipulation with Grounded Vision-Language Priors" (CVPR 2025) [Project Page] [Paper]

🔨 Environment Setup

  • Clone the repository with submodules

    git clone --recurse-submodules https://github.com/ZzZZCHS/RoboGround.git
    cd RoboGround
  • Create and activate the Conda environment

    conda env create -f roboground.yml
    conda activate roboground
  • Install dependencies

    pip install -e robosuite
    pip install -e robomimic
    pip install -e robocasa
    
    pip install PyOpenGL==3.1.9
  • Set up the private macro file for robocasa

    cd robocasa
    python robocasa/scripts/setup_macros.py

Data

  • Download and extract assets

    Download the following asset files from from RoboGround_Data:

    • robosuite_assets.tar.gz
    • robocasa_assets.tar.gz

    Then, extract them into their respective directories:

    tar -xzvf /path/to/robosuite_assets.tar.gz -C robosuite/robosuite/models/
    tar -xzvf /path/to/robocasa_assets.tar.gz -C robocasa/robocasa/models/
  • Download and extract generated data Download the generated data files (all files with extensions .zip, .z01, .z02, .z03) from RoboGround_Data.

    Next, use 7-Zip to unzip the files:

    • First, download and extract 7-Zip:

      wget https://www.7-zip.org/a/7z2409-linux-x64.tar.xz
      mkdir 7zip
      tar -xvf 7z2409-linux-x64.tar.xz -C 7zip

      Reminder: If you can directly install 7-Zip using your system's package manager (e.g., sudo apt install p7zip-full on Ubuntu), you can skip the manual installation process above. In this case, make sure to modify ./7zip/7zz in scripts/unzip_files.sh to use the default 7z command.

    • Then, unzip the downloaded data files using the provided script:

      bash scripts/unzip_files.sh /path/to/RoboGround_Data
    • After successful extraction, you can remove the .zip and .z* files:

      rm /path/to/RoboGround_Data/*.z*
  • Visualize demonstrations:

    cd robomimic
    bash robomimic/scripts/run_visualization.sh /path/to/TASK_NAME.hdf5
  • Data generation:

    We created a custom dataset based on robocasa's demonstrations by introducing object distractors and generating new instructions.

    • First, add objects and generate appearance-based instructions:

      cd robomimic
      python robomimic/scripts/generate_demos.py \
          --dataset /path/to/hdf5_data \  # Path to the original HDF5 dataset
          --n 3000 \  # Number of demonstrations to process (use a small number for debugging)
          --camera_height 512 --camera_width 512 \  # The size of observation images/masks
          --save_new_data --save_obs \  # Enable saving of augmented data and observations
          --write_gt_mask \  # Save ground-truth masks for target objects and placement areas
          --write_video \  # Save video visualizations if needed
          --use_actions  # Replay original robot actions
    • Once new demonstrations are generated, create corresponding spatial and commonsense instructions. Update the file_paths and output_dir variables in the following scripts:

      cd data_gen/gpt
      python generate_spatial_instructions.py
      python generate_common_instructions.py

Training & Evaluation

Using the generated dataset, we first train a grounded vision-language model (VLM) to detect target objects and placement areas. For implementation details, refer to our groundingLMM repository.

Next, we train a robot policy using the GR-1 framework. Full implementation details can be found in our GR-1 repository.

TODO

  • Data release.
  • Code and instruction for data generation.
  • Code and instruction for model training&evaluation.

😊 Acknowledgement

Thanks to the open source of the following projects: robocasa, robomimic, robosuite, GLaMM

About

[CVPR 2025] RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors