Problem
Current behavior: execute_submission assumes the container's CWD is the cloned repo and immediately runs git apply after prepare_environment. There are no explicit checks that:
- The working directory is
/app/project.
- The clone succeeded and is a git repo.
HEAD matches instance.base_commit.
- The worktree is clean (no stray files/changes).
Why it matters: If the clone or checkout silently fails, git apply may run in the wrong directory or against the wrong commit. This yields misleading results (e.g., accidental success/failure), erodes reproducibility, and makes debugging hard. Explicit guardrails prevent whole classes of flaky failures and produce clearer errors.
Proposal
Before any patch checks:
- Assert we are inside a git repo:
git rev-parse --is-inside-work-tree.
- Assert current directory is
/app/project (or verify repo root via git rev-parse --show-toplevel equals /app/project).
- Assert
git rev-parse --verify HEAD equals instance.base_commit.
- Assert worktree is clean:
git status --porcelain is empty.
- If any check fails, append a structured
CommandOutput to the trace with a clear message and stop gracefully (return trace, not raise), so metrics can still be computed and logs are actionable.
- Add tests (integration/unit with mocks) covering:
- Correct HEAD and clean worktree.
- Mismatched HEAD.
- Dirty worktree.
- Not a git repo / wrong directory.
Acceptance criteria
- Evaluation aborts early with clear trace output if repo preconditions fail.
- No
git apply runs unless all repo checks pass.
- Trace includes the failing check's stdout/stderr and exit code.
- Tests added and passing.
Problem
Current behavior:
execute_submissionassumes the container's CWD is the cloned repo and immediately runsgit applyafterprepare_environment. There are no explicit checks that:/app/project.HEADmatchesinstance.base_commit.Why it matters: If the clone or checkout silently fails,
git applymay run in the wrong directory or against the wrong commit. This yields misleading results (e.g., accidental success/failure), erodes reproducibility, and makes debugging hard. Explicit guardrails prevent whole classes of flaky failures and produce clearer errors.Proposal
Before any patch checks:
git rev-parse --is-inside-work-tree./app/project(or verify repo root viagit rev-parse --show-toplevelequals/app/project).git rev-parse --verify HEADequalsinstance.base_commit.git status --porcelainis empty.CommandOutputto the trace with a clear message and stop gracefully (return trace, not raise), so metrics can still be computed and logs are actionable.Acceptance criteria
git applyruns unless all repo checks pass.