RoboCode

Agents for robot physical reasoning.

Work in progress.

Installation

git clone https://github.com/tomsilver/robocode.git
cd robocode
bash install.sh

Claude Code CLI setup

The agentic approach requires the Claude Code CLI (claude). Authenticate via one of:

Subscription (free usage): claude auth login
API key: set ANTHROPIC_API_KEY in your environment

Optionally set ROBOCODE_CLAUDE_CMD to point to a specific claude binary (defaults to claude on PATH).

The model parameter in agentic.yaml accepts CLI model aliases or full model IDs. Override per-run with e.g. model=opus on the command line.

Alias	Full model ID
`sonnet`	`claude-sonnet-4-6` (latest Sonnet, default)
`opus`	`claude-opus-4-6` (latest Opus)
`haiku`	`claude-haiku-4-5-20251001` (latest Haiku)

Older model versions can also be used by specifying the full ID:

Full model ID	Description
`claude-sonnet-4-5-20250929`	Claude Sonnet 4.5
`claude-opus-4-5-20251101`	Claude Opus 4.5
`claude-opus-4-1-20250805`	Claude Opus 4.1
`claude-sonnet-4-20250514`	Claude Sonnet 4
`claude-opus-4-20250514`	Claude Opus 4
`claude-3-7-sonnet-20250219`	Claude 3.7 Sonnet

See Anthropic models overview for the full list.

Environments

All environments are available as Hydra configs via environment=<config_name>.

Maze (discrete)

Config	Description
`small_maze`	Small grid maze
`large_maze`	Large grid maze

2D Kinematic (continuous, kinder geom2d)

Config	Kinder ID	Difficulty
`motion2d_easy`	`kinder/Motion2D-p0-v0`	Easy (0 passages)
`motion2d_medium`	`kinder/Motion2D-p1-v0`	Medium (1 passage)
`motion2d_hard`	`kinder/Motion2D-p3-v0`	Hard (3 passages)
`obstruction2d_easy`	`kinder/Obstruction2D-o0-v0`	Easy (0 obstructions)
`obstruction2d_medium`	`kinder/Obstruction2D-o2-v0`	Medium (2 obstructions)
`obstruction2d_hard`	`kinder/Obstruction2D-o4-v0`	Hard (4 obstructions)
`clutteredretrieval2d_easy`	`kinder/ClutteredRetrieval2D-o1-v0`	Easy (1 obstruction)
`clutteredretrieval2d_medium`	`kinder/ClutteredRetrieval2D-o10-v0`	Medium (10 obstructions)
`clutteredretrieval2d_hard`	`kinder/ClutteredRetrieval2D-o25-v0`	Hard (25 obstructions)
`clutteredstorage2d_easy`	`kinder/ClutteredStorage2D-b1-v0`	Easy (1 block)
`clutteredstorage2d_medium`	`kinder/ClutteredStorage2D-b3-v0`	Medium (3 blocks)
`clutteredstorage2d_hard`	`kinder/ClutteredStorage2D-b7-v0`	Hard (7 blocks)
`stickbutton2d_easy`	`kinder/StickButton2D-b1-v0`	Easy (1 button)
`stickbutton2d_medium`	`kinder/StickButton2D-b3-v0`	Medium (3 buttons)
`stickbutton2d_hard`	`kinder/StickButton2D-b5-v0`	Hard (5 buttons)
`pushpullhook2d`	`kinder/PushPullHook2D-v0`	Single variant

3D Kinematic (continuous, kinder geom3d)

Config	Kinder ID	Difficulty
`motion3d`	`kinder/Motion3D-v0`	Single variant
`obstruction3d_easy`	`kinder/Obstruction3D-o0-v0`	Easy (0 obstructions)
`obstruction3d_medium`	`kinder/Obstruction3D-o2-v0`	Medium (2 obstructions)
`obstruction3d_hard`	`kinder/Obstruction3D-o4-v0`	Hard (4 obstructions)
`shelf3d_easy`	`kinder/Shelf3D-o1-v0`	Easy (1 cube)
`shelf3d_medium`	`kinder/Shelf3D-o3-v0`	Medium (3 cubes)
`shelf3d_hard`	`kinder/Shelf3D-o5-v0`	Hard (5 cubes)
`transport3d_easy`	`kinder/Transport3D-o1-v0`	Easy (1 cube)
`transport3d_hard`	`kinder/Transport3D-o2-v0`	Hard (2 cubes)
`packing3d_easy`	`kinder/Packing3D-p1-v0`	Easy (1 part)
`packing3d_medium`	`kinder/Packing3D-p2-v0`	Medium (2 parts)
`packing3d_hard`	`kinder/Packing3D-p3-v0`	Hard (3 parts)

Sandbox

The agent runs inside a Docker container (robocode-sandbox) that provides full filesystem isolation, a restricted network, and a pre-built Python environment.

Security model

Layer	Mechanism
Filesystem	Docker bind-mount: agent can only write to `/sandbox` (the run's output dir)
Network	`init-firewall.sh` whitelists `api.anthropic.com`, GitHub IPs, and Claude telemetry; blocks everything else via iptables
Write hook	`PreToolUse` hook in `.claude/settings.json` double-checks Write/Edit paths stay inside `/sandbox`

What the agent sees

Path	Contents
`/sandbox/`	Working directory — agent writes `approach.py`, test scripts, etc. here
`/sandbox/primitives/`	Source files from `src/robocode/primitives/` (read reference)
`/robocode/.venv/bin/python`	Python 3.11 with all robocode dependencies pre-installed
`/robocode/prpl-mono/`	Third-party packages, bind-mounted read-only from the host submodule

Start docker

Mac OS

Simply open the Docker Desktop application. Look for the status indicator in the bottom-left corner of the GUI; it should say "Docker Engine Running".

Linux

sudo systemctl start docker
sudo systemctl enable docker

Building the image

Build once from the repo root (rebuild when pyproject.toml / uv.lock change; not needed for prpl-mono code changes):

bash docker/build.sh

Using the OS-level sandbox (legacy)

The original macOS Seatbelt / Linux bubblewrap sandbox is still available (use_docker: false in agentic.yaml) but has a known limitation: it restricts filesystem writes but allows reads of the entire host filesystem.

Red team the sandbox:

python integration_tests/red_team_sandbox.py           # OS-level
python integration_tests/red_team_sandbox.py --docker  # Docker

Experiments

Run an experiment:

python experiments/run_experiment.py approach=random environment=small_maze seed=0

Run a sweep over multiple seeds and environments:

python experiments/run_experiment.py -m seed=0,1,2 environment=small_maze,large_maze approach=random

Analyze results from one or more runs:

python experiments/analyze_results.py multirun/

Agentic approach

The agentic approach launches a Claude agent during train(). The agent reads the environment source code, figures out the state/action space and dynamics, and writes a GeneratedApproach class that is used at evaluation time. The agent can also write and run test scripts against the real environment to verify its solution before committing.

By default the agent runs in the Docker sandbox (requires bash docker/build.sh once):

python experiments/run_experiment.py approach=agentic environment=motion2d_easy

To use the legacy OS-level sandbox instead:

python experiments/run_experiment.py approach=agentic environment=small_maze approach.use_docker=false

To skip re-generation and load a previously generated approach:

python experiments/run_experiment.py approach=agentic environment=small_maze \
    approach.load_dir=outputs/2026-02-16/16-00-41

Parallel sweeps each get their own container (named robocode-sandbox-<uuid>), so multiple runs never interfere:

python experiments/run_experiment.py -m seed=0,1,2 environment=small_maze,large_maze approach=agentic

Use the joblib launcher to run jobs in parallel locally:

python experiments/run_experiment.py -m \
    approach=agentic \
    approach.use_docker=true \
    seed=42,24,424,444,222 \
    'primitives=[]' \
    environment=motion2d_easy,obstruction2d_easy,clutteredretrieval2d_easy,clutteredstorage2d_easy,stickbutton2d_easy,pushpullhook2d \
    'hydra.sweep.dir=multirun/2026-02-23/no_primitives_5d_s42_24_424_444_222' \
    'hydra.sweep.subdir=s${seed}/${hydra:runtime.choices.environment}' \
    hydra/launcher=joblib hydra.launcher.n_jobs=4

The generated approach.py and full agent log are saved under sandbox/ in the run's output directory (e.g. outputs/2026-02-16/16-00-41/sandbox/).

Example: small_maze

On small_maze, the agent independently discovered A* pathfinding and achieved a 100% solve rate with optimal path lengths (mean 2.3 steps across 10 episodes):

{
  "mean_eval_reward": -2.3,
  "mean_eval_steps": 2.3,
  "solve_rate": 1.0,
  "num_eval_tasks": 10
}

Generated approach.py (A* pathfinding)

"""Optimal approach for MazeEnv using A* pathfinding algorithm."""

import heapq
from typing import Optional


class GeneratedApproach:
    """Optimal maze solver using A* pathfinding."""

    def __init__(self, action_space, observation_space):
        self.action_space = action_space
        self.observation_space = observation_space
        self.planned_path: Optional[list[tuple[int, int]]] = None
        self.path_index = 0

        self.UP = 0
        self.DOWN = 1
        self.LEFT = 2
        self.RIGHT = 3

        self.action_to_delta = {
            self.UP: (-1, 0),
            self.DOWN: (1, 0),
            self.LEFT: (0, -1),
            self.RIGHT: (0, 1)
        }

    def reset(self, state, info):
        self.planned_path = self._astar_search(state)
        self.path_index = 0

    def get_action(self, state):
        if self.planned_path and self.path_index < len(self.planned_path) - 1:
            next_pos = self.planned_path[self.path_index + 1]
            dr = next_pos[0] - state.agent[0]
            dc = next_pos[1] - state.agent[1]
            for action, (delta_r, delta_c) in self.action_to_delta.items():
                if (dr, dc) == (delta_r, delta_c):
                    self.path_index += 1
                    return action
        return self._greedy_action(state)

    def _astar_search(self, state) -> Optional[list[tuple[int, int]]]:
        start, goal = state.agent, state.goal
        heap = [(self._heuristic(start, goal), start, 0, [start])]
        visited = set()
        while heap:
            _, current, g_score, path = heapq.heappop(heap)
            if current in visited:
                continue
            visited.add(current)
            if current == goal:
                return path
            r, c = current
            for dr, dc in [(-1, 0), (1, 0), (0, -1), (0, 1)]:
                nr, nc = r + dr, c + dc
                neighbor = (nr, nc)
                if (0 <= nr < state.height and 0 <= nc < state.width
                    and neighbor not in state.obstacles
                    and neighbor not in visited):
                    new_g = g_score + 1
                    heapq.heappush(heap, (
                        new_g + self._heuristic(neighbor, goal),
                        neighbor, new_g, path + [neighbor]
                    ))
        return None

    def _heuristic(self, pos1, pos2) -> int:
        return abs(pos1[0] - pos2[0]) + abs(pos1[1] - pos2[1])

    def _greedy_action(self, state) -> int:
        dr = state.goal[0] - state.agent[0]
        dc = state.goal[1] - state.agent[1]
        if abs(dr) >= abs(dc):
            return self.DOWN if dr > 0 else self.UP
        return self.RIGHT if dc > 0 else self.LEFT

Agent log (19 turns, $0.36)

The agent's workflow:

Explored the sandbox directory and read maze_env.py and base_env.py
Analyzed the state type (_MazeState with agent, goal, obstacles, height, width), action space (4 directional moves), and reward structure (-1 per step)
Identified this as a shortest-path problem and implemented A* with Manhattan distance heuristic
Wrote and ran three test scripts verifying correctness on simple mazes, complex mazes with obstacles, impossible mazes (graceful fallback), and optimality of path lengths
Cleaned up test files and produced the final approach.py

18:32:05 Agent: I'll analyze the environment source files and create an optimal approach.py.
18:32:06 Tool call: Glob({'pattern': '**/*.py'})
18:32:08 Agent: Let me explore the main environment files to understand the structure:
18:32:09 Tool call: Read({'.../maze_env.py'})
18:32:09 Tool call: Read({'.../base_env.py'})
18:32:11 Agent: [Analyzed state/action space, identified A* as optimal strategy]
18:32:40 Tool call: Write({'.../approach.py'})
18:32:43 Agent: Now let me create a simple test script to verify the approach works:
18:33:00 Tool call: Write({'.../test_approach.py'})
18:33:03 Tool call: Bash({'command': 'python test_approach.py'})
18:33:07 Agent: Excellent! Testing on a more complex maze...
18:33:22 Tool call: Write({'.../test_complex_maze.py'})
18:33:26 Tool call: Bash({'command': 'python test_complex_maze.py'})
18:33:31 Agent: A* correctly found optimal 12-step path around all obstacles.
18:33:39 Tool call: Write({'.../test_real_state.py'})
18:33:54 Tool call: Bash({'command': 'python test_real_state.py'})
18:33:58 Agent: Works with actual _MazeState class.
18:34:09 Tool call: Write({'.../test_optimality.py'})
18:34:11 Tool call: Bash({'command': 'python test_optimality.py'})
18:34:13 Agent: All test cases produce optimal solutions.
18:34:14 Tool call: Bash({'command': 'rm test_*.py'})
18:34:25 Session done: turns=19, cost=$0.3628055, error=False

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
docker		docker
experiments		experiments
integration_tests		integration_tests
prpl-mono @ 262be0e		prpl-mono @ 262be0e
scripts		scripts
src/robocode		src/robocode
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
pyproject.toml		pyproject.toml
run_autoformat.sh		run_autoformat.sh
run_ci_checks.sh		run_ci_checks.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoboCode

Installation

Claude Code CLI setup

Environments

Maze (discrete)

2D Kinematic (continuous, kinder geom2d)

3D Kinematic (continuous, kinder geom3d)

Sandbox

Security model

What the agent sees

Start docker

Mac OS

Linux

Building the image

Using the OS-level sandbox (legacy)

Experiments

Agentic approach

Example: small_maze

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoboCode

Installation

Claude Code CLI setup

Environments

Maze (discrete)

2D Kinematic (continuous, kinder geom2d)

3D Kinematic (continuous, kinder geom3d)

Sandbox

Security model

What the agent sees

Start docker

Mac OS

Linux

Building the image

Using the OS-level sandbox (legacy)

Experiments

Agentic approach

Example: small_maze

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages