This codebase contains the implementation of the ViPlan benchmark.
The project is divided into the following main sections:
- Source code: viplan
- Notebooks: notebooks (mostly to visualize results)
- Scripts to run the benchmark: sh_scripts
- Data: data
The ViPlan benchmark is made up of several components, including the main experiment code and specific code for the two environments (Blocksworld and Household).
To run the experiments, you need to install the required packages. We recommend using mamba and provide an environment file for easy installation. The virtual environment requirements can be found at environment.yml, and it can be created as prefered. Here we report examples using mamba.
Using mamba:
mamba env create -p ./viplan_env -f environment.yml
mamba activate ./viplan_envWarning
Using conda is not ufficially supported, but if you want, swap mamba with conda everywhere (also in the sh_scripts) and you should be good. e.g.
conda env create -p ./viplan_env -f environment.yml
conda activate ./viplan_envIf you wish to use Flash Attention, it needs to be installed separately with the following command:
pip install flash-attn --no-build-isolationWarning
At the time of writing, Molmo has an issue with the latest version of transformers (>= 4.51.0). To run Molmo, please downgrade transformers to version 4.50.3 with pip install transformers==4.50.3.
The Blocksworld environment is based on the Photorealistic Blocksworld renderer, which is based on Blender. To install the Blender-based renderer, from the root directory of the repository, run the following commands:
./setup_blocksworld.shAdditionally, the libxi package needs to be installed (e.g., sudo apt-get install libxi) or available in the cluster.
Here is the list of specific requirements to use iGibson:
apptainer(former Singularity)- Encription key to be requested at this link
The Household environment is instead based on a custom version of iGibson. To install the environment, first clone our fork of iGibson:
git clone --depth 1 --single-branch --branch release_viplan https://github.com/nicoladainese96/iGibson.git ./iGibson --recursive
git clone https://github.com/StanfordVL/behavior.gitSince iGibson requires specific packages, we recommend running it inside a container. Our code is designed to work with Apptainer. To pull the image, run:
apptainer cache clean
apptainer pull docker://igibson/igibson:latestThis will create a file called igibson_latest.sif (it should take approximately 15 minutes), which is expected to be in the root directory. This file is a Singularity image that contains all the dependencies needed to run iGibson. To open a shell inside the container run:
apptainer exec --nv igibson_latest.sif bashThen, install the iGibson dependencies from inside the container:
python -m venv --system-site-packages ./igibson_env
source igibson_env/bin/activate
pip install -e ./iGibson
pip install -e ./behavior
pip install notebook pyquaternion shapely uvicorn fastapi unified_planning
pip install unified_planning[engines]Afterwards, the iGibson custom assets need to be downloaded following the instructions at this page:
To download the assets, run:
cd iGibson
wget --no-check-certificate https://storage.googleapis.com/gibson_scenes/ig_dataset.tar.gz
mkdir igibson/data
tar -xzvf ig_dataset.tar.gz -C ./igibson/dataThen, still in the iGibson folder, from inside the container run:
python -m igibson.utils.assets_utils --download_assets
python -m igibson.utils.assets_utils --download_demo_dataAs some of the assets are encrypted, you will need to download the key provided by the iGibson team. The key can be requested by filling out the form at this link and then needs to be placed inside the iGibson folder under igibson/data/igibson.key.
After this, the iGibson environment is ready to be used. For the benchmark, we use a client-server architecture, where the server runs inside the container and the client runs in the main execution environment. Scripts are provided in the sh_scripts folder to run the server and the client.
Warning
iGibson will create many temporary files under iGibson/igibson/data/ig_dataset/scene_instances, which are not removed automatically. The folder is safe to delete to clear up space.
In order to run some open-source models, you might need to accept their conditions on the huggingface hub. Then, you can include your token in the bash environment by running the following command:
export HF_TOKEN=<your_token>Similarly, in order to run closed-source models, include your API key in the bash environment by running the following command:
export OPENAI_API_KEY=<your_key>
export GEMINI_API_KEY=<your_key>
export ANTHROPIC_API_KEY=<your_key>The benchmark consists of two main experiment types, each implemented as an environment-agnostic Python script:
- VLM-as-Grounder (
viplan.experiments.benchmark_vlm_as_grounder): The VLM predicts symbolic predicates from visual observations, which are then used by a classical planner to generate actions. - VLM-as-Planner (
viplan.experiments.benchmark_vlm_as_planner): The VLM directly outputs actions from visual observations.
Both scripts work with any supported environment by specifying the --domain_name parameter (viplan-bw for Blocksworld, viplan-hh for Household/iGibson).
python3 -m viplan.experiments.benchmark_vlm_as_grounder \
--model_name "OpenGVLab/InternVL3-8B" \
--domain_name "viplan-bw" \
--domain_file "data/planning/blocksworld/domain.pddl" \
--problems_dir "data/planning/blocksworld/problems/simple" \
--prompt_path "data/prompts/benchmark/blocksworld/prompt.md" \
--root_path "." \
--output_dir "results/my_experiment" \
--seed 1Experiment variants for VLM-as-Grounder:
| Variant | Flag(s) | Description |
|---|---|---|
| Default | (none) | Standard Yes/No QA prompt (prompt.md) |
| Chain-of-Thought (CoT) | --use_cot_prompt (shell scripts) |
Uses the CoT prompt variant (prompt_cot.md) |
| With memory (Mem) | --include_prompt_history |
Injects previous-step failure context into the VLM prompt |
| Mem + CoT | --include_prompt_history --use_cot_prompt |
Combines CoT prompting with memory |
python3 -m viplan.experiments.benchmark_vlm_as_planner \
--model_name "OpenGVLab/InternVL3-8B" \
--domain_name "viplan-bw" \
--domain_file "data/planning/blocksworld/domain.pddl" \
--problems_dir "data/planning/blocksworld/problems/simple" \
--prompt_path "data/prompts/planning/vila_blocksworld_json.md" \
--root_path "." \
--output_dir "results/my_experiment" \
--max_steps 10 \
--seed 1Experiment variants for VLM-as-Planner:
| Variant | Flag(s) | Prompt selected |
|---|---|---|
| Default | (none) | vila_{env}_json.md |
| Chain-of-Thought (CoT) | --use_cot_prompt |
vila_{env}_json_cot.md |
| Act | --act_prompt |
act_{env}_json.md |
| Act + CoT | --use_cot_prompt --act_prompt |
react_{env}_json.md |
where {env} is blocksworld or igibson. The prompt is selected automatically by the shell scripts; when running Python directly, pass the desired prompt via --prompt_path.
For iGibson experiments, replace --root_path with --base_url pointing to the running iGibson server:
python3 -m viplan.experiments.benchmark_vlm_as_grounder \
--model_name "OpenGVLab/InternVL3-8B" \
--domain_name "viplan-hh" \
--domain_file "data/planning/igibson/domain.pddl" \
--problems_dir "data/planning/igibson/simple" \
--prompt_path "data/prompts/benchmark/igibson/prompt.md" \
--base_url "http://localhost:8900" \
--output_dir "results/my_experiment" \
--seed 1An oracle planner baseline is also available for iGibson:
python3 -m viplan.experiments.benchmark_igibson_oracle \
--base_url "http://localhost:8900" \
--domain_file "data/planning/igibson/domain.pddl" \
--problems_dir "data/planning/igibson/simple" \
--output_dir "results/oracle" \
--max_steps 10 \
--seed 1Note
The iGibson environment uses a client-server architecture. The simulation server must be started inside the Apptainer container before running experiments. See the iGibson setup section and the scripts in sh_scripts/ for details on starting the server.
We also provide bash scripts to run experiments locally as well as SLURM scripts to run on a cluster. The scripts are located in the sh_scripts folder. See the sh_scripts README for more details on available flags and how to use them. If you are using a different cluster manager, you may need to modify the SLURM scripts at sh_scripts/slurm_cluster accordingly.
We include all the results from the experiments reported in the paper in the results folder. To process and visualize them, we provide Jupyter notebooks in the notebooks folder. This reproduces exactly all the Figures and Tables reported in the paper.
ViPlan can be easily extended by the community to include new domains, models and methods.
In order to add a new domain, the following steps are needed:
- Add a new subfolder in
data/planning/with a PDDL domain file and per-split problem files. - Implement the domain simulator under
viplan/planning/, as a subclass ofPlanningSimulator. - Update
get_domain_configinviplan/code_helpers.pywith the domain-specific logic. - Add prompts under
data/prompts/.
Once this is done, the sh_scripts can run the new domain with minimal changes.
ViPlan provides integration with vLLM to run open-source VLMs. With updates to the framework, this should support new models as they are released, with minimal changes needed. We also provide interfaces with the OpenAI, Gemini and Anthropic APIs to run closed-source models, which should be able to support new models from these providers with minimal changes.
Edits to the existing VLM-as-planner and VLM-as-grounder methods can be made in the viplan/experiments folder. To add a completely new method, a new script can be added to the same folder, following the same structure as the existing ones.
