Skip to content

Render policy primitive + MCP Tools#72

Draft
merlerm wants to merge 3 commits intomainfrom
render_policy_primitive
Draft

Render policy primitive + MCP Tools#72
merlerm wants to merge 3 commits intomainfrom
render_policy_primitive

Conversation

@merlerm
Copy link
Collaborator

@merlerm merlerm commented Mar 9, 2026

Opening as draft since I only tested one run with this and now we are at limit until next week, I'd like to test it a bit more to see how it works before merging, but if you have time you can let me know what you think:

  • Implemented render_policy as a primitive, takes an env and approach and returns a folder with an episode "recorded" using that policy (as a series of single frame .pngs, since claude code does not like gifs or mp4s)
  • I also created utils/episode.py to re-use some of the logic (rendering videos, loading the approach) from run_experiment without duplicating. Not sure if "episode.py" is the best name for this though
  • Simple MCP server where render frame and policy can be called as tools rather than primitives. @yichao-liang can you also have a look to see if the MCP server is implemented in a good way (since you already had one setup for predicators I think?). Also the prompt gets MCP tool descriptions just like primitives

A note: I also changed the default config so that the two renders are passed as tools and not primitives, not sure if we want to keep this as the default, I can also change it back to how it was before.

@merlerm merlerm requested review from Jaraxxus-Me and tomsilver March 9, 2026 12:57
Copy link
Collaborator

@Jaraxxus-Me Jaraxxus-Me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This in general looks good. We need some experiments to see if the two MCP tools helps in solving the hard environments.
I have two points regarding this and like to hear feedback:

  • We might want to distinguish Tools available for Claude to "understand" the environment VS Primitives available for Claude to "solve" the environment. And I kind of lean towards: Not asking Claude to use Tools in the approach.py but always provide these tools in the system prompt (for it to generate the approach.py).
  • Point 1 also implies that we might not want another LLM/VLM in the solution, because we don't allow it to render the state as part of the solution. I think this is good because in this case we are consistent with the "state-space" of the environment (or, the solution is not trying to change the original state space of the environment), and we are not allowing infinite expressiveness of the solution (e.g., if it includes another claude code to read an image and write new code during test).

@merlerm
Copy link
Collaborator Author

merlerm commented Mar 9, 2026

Point 1 is right: MCP tools are used directly by the agent when solving the problem but they can't be used in approach.py, so the divide between the two is already there with this. I think that makes sense: in general however claude can still render in the code if it wants to (since it's just a couple lines of code) and I have seen it do it even if it had the tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants