Render policy primitive + MCP Tools by merlerm · Pull Request #72 · tomsilver/robocode

merlerm · 2026-03-09T12:57:15Z

Opening as draft since I only tested one run with this and now we are at limit until next week, I'd like to test it a bit more to see how it works before merging, but if you have time you can let me know what you think:

Implemented render_policy as a primitive, takes an env and approach and returns a folder with an episode "recorded" using that policy (as a series of single frame .pngs, since claude code does not like gifs or mp4s)
I also created utils/episode.py to re-use some of the logic (rendering videos, loading the approach) from run_experiment without duplicating. Not sure if "episode.py" is the best name for this though
Simple MCP server where render frame and policy can be called as tools rather than primitives. @yichao-liang can you also have a look to see if the MCP server is implemented in a good way (since you already had one setup for predicators I think?). Also the prompt gets MCP tool descriptions just like primitives

A note: I also changed the default config so that the two renders are passed as tools and not primitives, not sure if we want to keep this as the default, I can also change it back to how it was before.

Jaraxxus-Me

This in general looks good. We need some experiments to see if the two MCP tools helps in solving the hard environments.
I have two points regarding this and like to hear feedback:

We might want to distinguish Tools available for Claude to "understand" the environment VS Primitives available for Claude to "solve" the environment. And I kind of lean towards: Not asking Claude to use Tools in the approach.py but always provide these tools in the system prompt (for it to generate the approach.py).
Point 1 also implies that we might not want another LLM/VLM in the solution, because we don't allow it to render the state as part of the solution. I think this is good because in this case we are consistent with the "state-space" of the environment (or, the solution is not trying to change the original state space of the environment), and we are not allowing infinite expressiveness of the solution (e.g., if it includes another claude code to read an image and write new code during test).

merlerm · 2026-03-09T17:27:47Z

Point 1 is right: MCP tools are used directly by the agent when solving the problem but they can't be used in approach.py, so the divide between the two is already there with this. I think that makes sense: in general however claude can still render in the code if it wants to (since it's just a couple lines of code) and I have seen it do it even if it had the tools

merlerm added 3 commits March 6, 2026 16:03

render_policy primitive initial implementation

5e20383

mcp tool implementations

6ab5482

linting/formatting

d86a8cc

merlerm requested review from Jaraxxus-Me and tomsilver March 9, 2026 12:57

Jaraxxus-Me reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render policy primitive + MCP Tools#72

Render policy primitive + MCP Tools#72
merlerm wants to merge 3 commits intomainfrom
render_policy_primitive

merlerm commented Mar 9, 2026

Uh oh!

Jaraxxus-Me left a comment

Uh oh!

merlerm commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

merlerm commented Mar 9, 2026

Uh oh!

Jaraxxus-Me left a comment

Choose a reason for hiding this comment

Uh oh!

merlerm commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants