An implementation of CC-POMCP#68
Conversation
…r; added example problem for CCPOMCP.
|
Thanks for the effort. Much better organized than before. Will take a pass soon. One thing - the CIs are failing. Could you fix this first? I just merged this PR that will make CI actions trigger when you update the source branch of this PR. Please update your branch with the latest changes in main. |
|
I merged the changes from |
|
Great. Looks like CIs passed except for pre-commit. I will review the code soon. |
There was a problem hiding this comment.
Apologies for the super delay... I was planning to test this out locally and merge if there's no issue. But when I ran python -m pomdp_py -r ccrocksample, the program failed due to particle deprivation - I think you should have at least one working example of your algorithm. I was curious what advantage or difference CC-POMCP has over POMCP but this is not clear to me. I hope that your example could make that clear. Thanks.
*** Testing CC-POMCP ***
==== Step 1 ====
True state: State((0, 3) | ('good', 'good', 'bad', 'bad', 'good', 'good', 'good', 'good') | False)
Action: check-5
Observation: good
Response: reward=0, cost=1
Response (Cumulative): reward=0, cost=1
Response (Cumulative Discounted): reward=0, cost=1
__num_sims__: 10000
__plan_time__: 12.15742
World:
______ID______
4......>
7....0.>
3..2...>
R......>
..6....>
1......>
.....5.>
_____G/B_____
$......>
$....$.>
x..x...>
R......>
..$....>
$......>
.....$.>
==== Step 2 ====
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/kzheng/repo/pomdp-py/pomdp_py/__main__.py", line 51, in <module>
main()
File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 215, in main
total_response, total_discounted_response = test_planner(
File "/home/kzheng/repo/pomdp-py/pomdp_py/problems/cc_rocksample/cc_rocksample_problem.py", line 158, in test_planner
ccpomcp.update(cc_rocksample.agent, action, real_observation)
File "pomdp_py/algorithms/pomcp.pyx", line 96, in pomdp_py.algorithms.pomcp.POMCP.update
cpdef update(self, Agent agent, Action real_action, Observation real_observation,
File "pomdp_py/algorithms/pomcp.pyx", line 120, in pomdp_py.algorithms.pomcp.POMCP.update
agent.set_belief(particle_reinvigoration(tree_belief,
File "pomdp_py/representations/belief/particles.pyx", line 29, in pomdp_py.representations.belief.particles.particle_reinvigoration
raise ValueError("Particle deprivation.")
ValueError: Particle deprivation.
Also, I noticed the added unit test for cvec; That's good. Could you add a test that involves calling CC-POMCP? It's true that such tests aren't there yet even for existing algorithms, but it'd be good for new algorithms to have tests still.
Left other comments, mostly minor and docstring-related. I hope the delay here didn't delay the progress of your work. Thanks again for the contribution!
| cdef void vector_copy(double[:] src, double[:] dst) | ||
|
|
||
|
|
||
| cdef class Vector: |
There was a problem hiding this comment.
Could you add a docstring that describes what this is for?
| Option | ||
| ) | ||
| from typing import Optional | ||
|
|
There was a problem hiding this comment.
What is the purpose of generalization? Please add docstring
| raise NotImplementedError | ||
|
|
||
|
|
||
| cdef class ResponseModel: |
There was a problem hiding this comment.
How does "response" differ from "observation" or "reward"? Please clarify. Please include relevant references since this is not common terminology. If it is a wrapper for "reward", why is this wrapper necessary?
Perhaps, it is best to point out the gist of CC-POMCP: Maximize cumulative reward while constraining cumulative cost.
| print("Observation: %s" % str(real_observation)) | ||
| print("Response: %s" % str(env_response)) | ||
| print("Response (Cumulative): %s" % str(total_response)) | ||
| print("Response (Cumulative Discounted): %s" % str(total_discounted_response)) |
There was a problem hiding this comment.
To showcase the benefit of CC-POMCP, you should indicate what the cost constraint is, and demonstrate that the constraint isn't crossed.
| State, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
Would be really nice to have some other domains; Rocksample is really old and kind of detached from reality. If cost-constraint is really useful it should find applications in other more realistic domains.
| self._rollout_policy = rollout_policy | ||
|
|
||
| cpdef QNode _create_qnode( | ||
| self, |
There was a problem hiding this comment.
This lgtm. Please add docstring though; Looks like it's necessary to create QNode with parameters that differ from default ones e.g. self._num_visits_init in CC-POMCP. Why?
This pull request implements the cost-constrained POMCP (CC-POMCP) algorithm and its dependencies. The algorithm is in
algorithms/ccpomcp.p*, while the dependencies are in theframework/generalization.p*andutils/cvec.p*files. This pull request also proposes a generic model, called aResponseModel, and a corresponding output, called aResponse. The name "response" comes from the notion of independent and dependent variables, where a response (reward, cost, etc.) depends on the interaction with the real or simulated environment. Thus, a response model is a wrapper for more specific models, such as reward and cost models (and any others that will follow in the future). By extension, a response is a wrapper for the reward, cost, etc.The
framework/generalization.p*files containThe pull request has the following:
test_util_vector_ops.py), and