Add generic Agent interface and runner by tomerqodo · Pull Request #1 · tomerqodo/preparedness

tomerqodo · 2025-07-30T08:03:42Z

User description

Summary

implement AgentBase for unified agent interactions
add Docker environment utilities for starting tasks
create run_agent.py entrypoint to run a task with an agent
provide a gold patch agent and integration test

Testing

pytest project/swelancer/tests/test_run_agent.py -q

https://chatgpt.com/codex/tasks/task_b_6889cae68290832898ff73a678394c36

PR Type

Enhancement

Description

Add AgentBase abstract interface for unified agent interactions
Implement Docker utilities for task environment management
Create run_agent.py entrypoint for executing tasks with agents
Provide gold patch agent and integration test

Diagram Walkthrough

flowchart LR
  AgentBase["AgentBase Interface"] --> GoldPatchAgent["GoldPatchAgent Implementation"]
  DockerUtils["Docker Utilities"] --> RunAgent["run_agent.py Runner"]
  RunAgent --> AgentBase
  RunAgent --> DockerUtils
  TestRunner["Integration Test"] --> GoldPatchAgent
  TestRunner --> RunAgent

File Walkthrough

Relevant files

Configuration changes

__init__.py `Initialize agent module exports` project/swelancer/swelancer/agent/init.py Export `AgentBase` class from the agent module	+3/-0

Enhancement

agent_base.py `Abstract base class for SWE-Lancer agents` project/swelancer/swelancer/agent/agent_base.py Define abstract `AgentBase` class with `predict` method Support arbitrary configuration via `**config` parameter Handle both patch and choice output schemas Provide comprehensive docstrings for interface contract	+50/-0
docker_utils.py `Docker environment management utilities` project/swelancer/swelancer/agent/docker_utils.py Implement `start_task_environment` for Docker container setup Add `teardown_task_environment` for cleanup Support both monolith and task-specific images Detect task type from task ID	+68/-0
run_agent.py `Command-line agent runner implementation` project/swelancer/swelancer/run_agent.py Create CLI entrypoint for running agents on tasks Load problem statements from issue data files Handle both patch and choice output schemas Apply patches and run tests in Docker containers Validate manager task solutions against gold answers	+109/-0

Tests

gold_patch_agent.py `Gold standard agent for testing` project/swelancer/tests/gold_patch_agent.py Implement test agent that returns correct patches Reverse bug reintroduction patches to create fixes Extend `AgentBase` with task-specific initialization	+41/-0
test_run_agent.py `Integration test for agent runner` project/swelancer/tests/test_run_agent.py Add end-to-end integration test for agent runner Test Docker environment setup and teardown Validate patch application and test execution Skip test when Docker is unavailable	+44/-0

qodo-code-review · 2025-07-30T08:04:15Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 Security concerns Command injection: The code in `run_agent.py` constructs shell commands using string formatting with user-controlled input. In line 87, `dest` variable containing JSON data is directly interpolated into a bash command without proper escaping, which could allow command injection if the solution contains malicious content.
⚡ Recommended focus areas for review Error Handling The `start_task_environment` function uses `subprocess.check_output` which will raise an exception on failure, but there's no error handling. This could cause the entire process to crash if Docker commands fail. container_id = ( subprocess.check_output( [ "docker", "run", "-d", "--rm", "--entrypoint", "/app/runtime_scripts/run.sh", image, ] ) .decode() .strip() ) Resource Cleanup The Docker container teardown only happens at the end of main function. If an exception occurs during execution, the container may not be properly cleaned up, leading to resource leaks. teardown_task_environment(env_info) Hardcoded Path The code contains hardcoded paths like `/app/expensify` which makes it brittle and less portable. These paths should be configurable or derived from environment information. "git apply - && cd /app/expensify && pytest -q", ] res = subprocess.run( apply_cmd, input=str(solution).encode(), capture_output=True ) success = res.returncode == 0 print(res.stdout.decode()) else: dest = json.dumps({"proposal_id": solution}) subprocess.run( [ "docker", "exec", env_info["container_id"], "bash", "-c", f"echo '{dest}' > /app/expensify/manager_decisions.json",

qodo-code-review · 2025-07-30T08:04:56Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Add Docker subprocess error handling The subprocess call lacks error handling and could fail silently or with unclear error messages. Add proper exception handling to provide meaningful feedback when Docker operations fail. project/swelancer/swelancer/agent/docker_utils.py [38-52] -container_id = ( - subprocess.check_output( - [ - "docker", - "run", - "-d", - "--rm", - "--entrypoint", - "/app/runtime_scripts/run.sh", - image, - ] +try: + container_id = ( + subprocess.check_output( + [ + "docker", + "run", + "-d", + "--rm", + "--entrypoint", + "/app/runtime_scripts/run.sh", + image, + ] + ) + .decode() + .strip() ) - .decode() - .strip() -) +except subprocess.CalledProcessError as e: + raise RuntimeError(f"Failed to start Docker container for image {image}: {e}") Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that the `subprocess.check_output` call lacks explicit error handling, and adding it improves the function's robustness and provides better error messages.	Medium
General	Include stderr in subprocess output The code only prints stdout but ignores stderr, which could contain important error information when patch application or tests fail. Include stderr output for better debugging. project/swelancer/swelancer/run_agent.py [73-77] res = subprocess.run( apply_cmd, input=str(solution).encode(), capture_output=True ) success = res.returncode == 0 print(res.stdout.decode()) +if res.stderr: + print(res.stderr.decode()) Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly points out that ignoring `stderr` can hide crucial error information, and printing it significantly improves the script's debuggability when the patch or tests fail.	Medium
General	Add CSV parsing error handling The CSV parsing logic is fragile and could fail if the file doesn't exist or has unexpected format. Add proper error handling for file operations and JSON parsing. project/swelancer/swelancer/run_agent.py [91-101] -with open( - Path(__file__).resolve().parent.parent / "all_swelancer_tasks.csv" -) as f: - lines = [ - l for l in f.read().splitlines() if l.startswith(args.task_id + ",") - ] -gold_choice = None -if lines: - gold_choice = json.loads(lines[0].split(",")[-1])["game"][ - "correct_proposal" - ]["id"] +try: + with open( + Path(__file__).resolve().parent.parent / "all_swelancer_tasks.csv" + ) as f: + lines = [ + l for l in f.read().splitlines() if l.startswith(args.task_id + ",") + ] + gold_choice = None + if lines: + gold_choice = json.loads(lines[0].split(",")[-1])["game"][ + "correct_proposal" + ]["id"] +except (FileNotFoundError, json.JSONDecodeError, KeyError, IndexError) as e: + print(f"Warning: Could not load gold answer for validation: {e}") + gold_choice = None Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that the file and JSON parsing logic is brittle, and adding a `try-except` block makes the script more robust against a missing file or malformed data.	Medium
More

Add AgentBase interface and runner with docker utilities

97b0dc5

tomerqodo added the codex label Jul 30, 2025 — with ChatGPT Codex Connector

qodo-code-review Bot added Review effort 4/5 Possible security concern labels Jul 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add generic Agent interface and runner#1

Add generic Agent interface and runner#1
tomerqodo wants to merge 1 commit into
mainfrom
codex/integrate-agent-class-for-swelancer

tomerqodo commented Jul 30, 2025 •

edited by qodo-code-review Bot

Loading

Uh oh!

qodo-code-review Bot commented Jul 30, 2025

Uh oh!

qodo-code-review Bot commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tomerqodo commented Jul 30, 2025 • edited by qodo-code-review Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Testing

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review Bot commented Jul 30, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-code-review Bot commented Jul 30, 2025

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomerqodo commented Jul 30, 2025 •

edited by qodo-code-review Bot

Loading