Skip to content

Block PRs that modify multiple agent directories#219

Merged
PunchTheDev merged 7 commits into
mainfrom
punch/block-cross-agent-pr-modification
Jun 3, 2026
Merged

Block PRs that modify multiple agent directories#219
PunchTheDev merged 7 commits into
mainfrom
punch/block-cross-agent-pr-modification

Conversation

@PunchTheDev
Copy link
Copy Markdown
Owner

Attack vector being closed

A malicious miner could submit a PR that modifies both agents/their-agent/agent.py AND agents/someone-elses/agent.py. CI would evaluate whichever path sorts first, post a green result, and on merge silently overwrite the victim's agent in main. The victim's next merge would then run with corrupted code.

Fix

In the Find changed agent step (eval.yml), count unique agents/*/ directories touched by the PR. If more than one, exit 1 before any eval runs:

ERROR: PR modifies agent.py in multiple directories:
agents/alice-agent
agents/victim-agent
Each PR must touch exactly one agent directory.

The existing template exclusion (grep -v '^agents/template/') is preserved. Non-agent files in the same PR (e.g. a README.md inside the agent directory) are unaffected — only agent.py files are counted.

Test plan

  • PR touching one agents/*/agent.py → CI proceeds normally
  • PR touching two or more agents/*/agent.py → CI exits 1 with clear error before building Docker image
  • PR touching only non-agent files in agents/** → CI skips eval (found=false, unchanged behavior)

🤖 Generated with Claude Code

Punch and others added 7 commits June 3, 2026 13:14
Three independent fixes identified in scale readiness audit:

1. run_eval_pool.py: distinguish container crash (returncode != 0, no output)
   from bad JSON (container ran but output is garbage). Previously both
   showed "Invalid JSON output" — crash now shows "Container exited 137"
   with stderr tail, making OOM kills and segfaults debuggable by miners.

2. record_submissions.py: skip STEP files smaller than 200 bytes.
   The file is pre-created as 0 bytes before docker run so the container
   can write to it; if the container crashes mid-run the file stays empty.
   Storing an empty BLOB sets has_step=true for a submission with no
   geometry, breaking the 3D viewer for that entry.

3. score.yml: increase score-round timeout-minutes from 90 → 150.
   15 specs × ~180s each + Docker overhead ≈ 50 min per round; 90 min
   was dangerously close to the limit for slower specs under high load.
   eval.yml and hidden-eval remain at 90 min (3 specs each — sufficient).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A malicious PR could include changes to agents/alice/agent.py alongside
agents/bob/agent.py. CI would eval the first alphabetically, but on merge
both files land in main — silently overwriting another miner's agent.

Fix: in the 'Find changed agent' step, count unique agent subdirectories
touched by the PR. If more than one, exit 1 with a clear error message
before any eval runs. Each PR must touch exactly one agents/* directory.

The template directory is already excluded from eval. Non-agent file changes
(README, spec.txt) in the same PR are still allowed as before.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@PunchTheDev PunchTheDev merged commit 9c9178a into main Jun 3, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant