Skip to content

RDLLab/ROPRAS3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

153 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reference-based Online POMDP Planning via Rapid State Space Sammpling (ROP-RAS3)

arXiv ISRR

Overview

ROP-RAS3 is a fast online POMDP planner that searches for approximately optimal robot actions under transition and observation uncertainties. Its two key ingredients are:

  • A transformation of the POMDP value function that allows the optimal solution, with respect to a reference policy, to be computed partially analytically. This transformation reduces the numerical computation in solving POMDPs to estimating expectations, which in turn enables the use of fast Monte Carlo techniques for estimation (rather than a combination of estimation and optimisation techniques) in POMDP planning.
  • Vector-accelerated sampling-based motion planning VAMP, which can compute sampling-based deterministic motion plans extremely quickly. This speed allows the rapid generation of diverse sets of macro-actions based on state-space information, forming reference policies that bias the search in the belief space.

demo video

Above videos demonstrate ROPRAS-3 handling complex (requiring lookahead > 3000 steps) planning problems. The robot doesn't fully observe its own configurations and the poses of the cylinders. It initially aims to gather information about the cylinders. Once it is more certain about the environment, as the belief particles (showing as yellow ghost objects) sharpen around the true states, it proceeds to remove only necessary obstacles, leaving a valid path to retrieve the target cylinder (initially at the back) and put it at the intended location at the top of the shelf.

Installation

ROP-RAS3 is built based on the pomdp-py and vamp repos.

We recommond using python==3.10 and use uv for mananging virtual environments and python packages.

This installation instruction assumes the following packages are already installed: cmake, gcc/g++, and git.

Once you have uv installed, in the project root directory, create a virtual environment,

uv venv --python 3.10

Install python dev packages,

sudo apt install python3-dev libeigen3-dev

To install POMDP planning components,

uv pip install Cython
cd pomdp-py
uv pip install -e .

To install vamp,

cd vamp
uv pip install -e .

Finally, make sure your virtual environment is activated before running,

source .venv/bin/activate

Run

For manipulation tasks, make sure you have compiled the panda arm for vamp,

python vamp/resources/problem_tar_to_pkl_json.py --robot panda

To run an experiment with visualisations on:

python scripts/run_scripts.py --visualise --runs 1 --problem_name maze2d

Available problems include maze2d, light_dark, random3d, sphere_search, capture, table_ray_pick, shelf_search.

Another example of running shelf-search only once with 100 simulations per planning step and save the video:

python scripts/run_scripts.py --problem_name shelf_search --runs 1 --visualise --file_logging --num_sims 100 --video_recording

Warning: currently the --video_recording only saves everything under the same name, so it will overwrites existing ones.

Other command line arguments include,

  1. --file_logging: record experiment statistics in the problem folder (e.g. pomdp-py/pomdp_py/problems/shelf_search/log)
  2. --runs: number of runs for this problem
  3. --problem_name: select which problem to use
  4. --steps: maximum episode steps for the problem
  5. --planner_name: which planner to use, default ROP-RAS3,
  6. --time_out: the maximum planning time per step, -1 for unlimited, but num_sims need to be set, see below
  7. --no_pomdp: while selecting ROP-RAS3, this command line turns off the POMDP planning, we obtain belief VAMP.
  8. --ref_policy_heuristic: select between "uniform" or "entropy"
  9. --finite_ref_actions: only useful when the planner is POMCP, it turns on the RPOMCP baseline
  10. --video_recording: whether record the videos for each run
  11. --num_sims: number of simulated episodes to sample

File Structures

All vamp related stuffs are organised under vamp/ and all pomdp planning related stuffs are organised under pomdp-py/.

Within pomdp-py the file structure is,

pomdp_py/
├── __init__.py
├── __main__.py
├── algorithms/
│   ├── ... # POMDP planning algorithm, ROPRAS3 implemented in here
├── framework/
│   ├──... # base templates
├── problems/
│   ├── ... # problem definitions, each in their own subdirectories
├── representations/
│   ├── ... # particle beliefs and distribution functions
└── utils/
    ├── ... # auxillary helper functions

There are two seperate running pipeling for navigation tasks and manipulation tasks respectively. They are located in pomdp-py/pomdp_py/problems/motion_planning/runner.py and pomdp-py/pomdp_py/problems/sphere_search/runner/runner.py.

Create Your Own Environments

You can create new environment but initialising a new folder in the pomdp-py/pomdp_py/problems/ directory. Here is our recommended build. The following components are needed within your problem folder,

  1. domain/: this should include all the files that define your POMDP models, including actions, states, observations, the observation model, the transition model and reward classes. In addition, it should contain a reference_policy_model to build your own reference policy by querying from vamp.
  2. env/: this defines the actual physical environment, including obstacles, robots, walls etc.
  3. problem.py file is the entroy point of running the experiment, it should put everything together, including the planning and the experiment pipeline.
  4. utils: include any helper classes and functions, typically the particle reinvigorator.

Best Practices

Particle Beliefs

ROPRAS uses particle beliefs and sampling important resampling (SIR) to update the belief in between planning steps. A key trade off is the number of particles, more particles means the approximation is closer to the true distribution, but the SIR updates takes a lot longer. Typically, we found 500 to 1000 particles work well. Yet, particle depletion may occur. In the examples we provided, this depletion is most likely due to the fact that the true state of the robot is near a terminal state (danger zone, goal etc), so all particles are terminated, but the true state is not. Hence, we also implement particle reinvigorations in each problem to genereate new particles when necessary.

Search Depth

In general, we found setting the search depth to be roughly the number of actions needed to reach the goal in deterministic settings is a good starting point. Optimal search depth may need to be a bit larger than this to take uncertainties into account.

Discount factor

For environments with very long horizon, a discount factor of 0.99 is generally not enough, set the discount to be 0.999 or even higher but below 1.

Log Replay Commands

We also support reading logged files and replay them in pybullet,

python scripts/log_replay.py --problem [problem for the log] --log [the intended log file path] --video_recording

Citations

If you found this research useful for your own work, please use the following citation:

@inproceedings{Liang2026Thinking,
	title        = {Think Fast and Far: Long-Horizon Online POMDP Planning via Rapid State Sampling},
	author       = {Yuanchu Liang and Edward Kim and J.Arden Knoll and Wil Thomason and Zachary Kingston and Lydia E. Kavraki and Hanna Kurniawati},
	year         = 2026,
	booktitle    = {International Journal of Robotics Research (to appear)}
}
@inproceedings{Liang2024Scaling,
	title        = {Scaling Long-Horizon Online {POMDP} Planning via Rapid State Space Sampling},
	author       = {Yuanchu Liang and Edward Kim and Wil Thomason and Zachary Kingston and Hanna Kurniawati and Lydia E. Kavraki},
	year         = 2024,
	booktitle    = {International Symposium on Robotics Research}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors