Draft
Conversation
Jaraxxus-Me
reviewed
Mar 9, 2026
Collaborator
Jaraxxus-Me
left a comment
There was a problem hiding this comment.
This in general looks good. We need some experiments to see if the two MCP tools helps in solving the hard environments.
I have two points regarding this and like to hear feedback:
- We might want to distinguish Tools available for Claude to "understand" the environment VS Primitives available for Claude to "solve" the environment. And I kind of lean towards: Not asking Claude to use Tools in the approach.py but always provide these tools in the system prompt (for it to generate the approach.py).
- Point 1 also implies that we might not want another LLM/VLM in the solution, because we don't allow it to render the state as part of the solution. I think this is good because in this case we are consistent with the "state-space" of the environment (or, the solution is not trying to change the original state space of the environment), and we are not allowing infinite expressiveness of the solution (e.g., if it includes another claude code to read an image and write new code during test).
Collaborator
Author
|
Point 1 is right: MCP tools are used directly by the agent when solving the problem but they can't be used in approach.py, so the divide between the two is already there with this. I think that makes sense: in general however claude can still render in the code if it wants to (since it's just a couple lines of code) and I have seen it do it even if it had the tools |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Opening as draft since I only tested one run with this and now we are at limit until next week, I'd like to test it a bit more to see how it works before merging, but if you have time you can let me know what you think:
A note: I also changed the default config so that the two renders are passed as tools and not primitives, not sure if we want to keep this as the default, I can also change it back to how it was before.