Skip to content

Add generic Agent interface and runner#1

Open
tomerqodo wants to merge 1 commit into
mainfrom
codex/integrate-agent-class-for-swelancer
Open

Add generic Agent interface and runner#1
tomerqodo wants to merge 1 commit into
mainfrom
codex/integrate-agent-class-for-swelancer

Conversation

@tomerqodo

@tomerqodo tomerqodo commented Jul 30, 2025

Copy link
Copy Markdown
Owner

User description

Summary

  • implement AgentBase for unified agent interactions
  • add Docker environment utilities for starting tasks
  • create run_agent.py entrypoint to run a task with an agent
  • provide a gold patch agent and integration test

Testing

  • pytest project/swelancer/tests/test_run_agent.py -q

https://chatgpt.com/codex/tasks/task_b_6889cae68290832898ff73a678394c36


PR Type

Enhancement


Description

  • Add AgentBase abstract interface for unified agent interactions

  • Implement Docker utilities for task environment management

  • Create run_agent.py entrypoint for executing tasks with agents

  • Provide gold patch agent and integration test


Diagram Walkthrough

flowchart LR
  AgentBase["AgentBase Interface"] --> GoldPatchAgent["GoldPatchAgent Implementation"]
  DockerUtils["Docker Utilities"] --> RunAgent["run_agent.py Runner"]
  RunAgent --> AgentBase
  RunAgent --> DockerUtils
  TestRunner["Integration Test"] --> GoldPatchAgent
  TestRunner --> RunAgent
Loading

File Walkthrough

Relevant files
Configuration changes
__init__.py
Initialize agent module exports                                                   

project/swelancer/swelancer/agent/init.py

  • Export AgentBase class from the agent module
+3/-0     
Enhancement
agent_base.py
Abstract base class for SWE-Lancer agents                               

project/swelancer/swelancer/agent/agent_base.py

  • Define abstract AgentBase class with predict method
  • Support arbitrary configuration via **config parameter
  • Handle both patch and choice output schemas
  • Provide comprehensive docstrings for interface contract
+50/-0   
docker_utils.py
Docker environment management utilities                                   

project/swelancer/swelancer/agent/docker_utils.py

  • Implement start_task_environment for Docker container setup
  • Add teardown_task_environment for cleanup
  • Support both monolith and task-specific images
  • Detect task type from task ID
+68/-0   
run_agent.py
Command-line agent runner implementation                                 

project/swelancer/swelancer/run_agent.py

  • Create CLI entrypoint for running agents on tasks
  • Load problem statements from issue data files
  • Handle both patch and choice output schemas
  • Apply patches and run tests in Docker containers
  • Validate manager task solutions against gold answers
+109/-0 
Tests
gold_patch_agent.py
Gold standard agent for testing                                                   

project/swelancer/tests/gold_patch_agent.py

  • Implement test agent that returns correct patches
  • Reverse bug reintroduction patches to create fixes
  • Extend AgentBase with task-specific initialization
+41/-0   
test_run_agent.py
Integration test for agent runner                                               

project/swelancer/tests/test_run_agent.py

  • Add end-to-end integration test for agent runner
  • Test Docker environment setup and teardown
  • Validate patch application and test execution
  • Skip test when Docker is unavailable
+44/-0   

@qodo-code-review

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 Security concerns

Command injection:
The code in run_agent.py constructs shell commands using string formatting with user-controlled input. In line 87, dest variable containing JSON data is directly interpolated into a bash command without proper escaping, which could allow command injection if the solution contains malicious content.

⚡ Recommended focus areas for review

Error Handling

The start_task_environment function uses subprocess.check_output which will raise an exception on failure, but there's no error handling. This could cause the entire process to crash if Docker commands fail.

container_id = (
    subprocess.check_output(
        [
            "docker",
            "run",
            "-d",
            "--rm",
            "--entrypoint",
            "/app/runtime_scripts/run.sh",
            image,
        ]
    )
    .decode()
    .strip()
)
Resource Cleanup

The Docker container teardown only happens at the end of main function. If an exception occurs during execution, the container may not be properly cleaned up, leading to resource leaks.

teardown_task_environment(env_info)
Hardcoded Path

The code contains hardcoded paths like /app/expensify which makes it brittle and less portable. These paths should be configurable or derived from environment information.

        "git apply - && cd /app/expensify && pytest -q",
    ]
    res = subprocess.run(
        apply_cmd, input=str(solution).encode(), capture_output=True
    )
    success = res.returncode == 0
    print(res.stdout.decode())
else:
    dest = json.dumps({"proposal_id": solution})
    subprocess.run(
        [
            "docker",
            "exec",
            env_info["container_id"],
            "bash",
            "-c",
            f"echo '{dest}' > /app/expensify/manager_decisions.json",

@qodo-code-review

Copy link
Copy Markdown

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add Docker subprocess error handling

The subprocess call lacks error handling and could fail silently or with unclear
error messages. Add proper exception handling to provide meaningful feedback
when Docker operations fail.

project/swelancer/swelancer/agent/docker_utils.py [38-52]

-container_id = (
-    subprocess.check_output(
-        [
-            "docker",
-            "run",
-            "-d",
-            "--rm",
-            "--entrypoint",
-            "/app/runtime_scripts/run.sh",
-            image,
-        ]
+try:
+    container_id = (
+        subprocess.check_output(
+            [
+                "docker",
+                "run",
+                "-d",
+                "--rm",
+                "--entrypoint",
+                "/app/runtime_scripts/run.sh",
+                image,
+            ]
+        )
+        .decode()
+        .strip()
     )
-    .decode()
-    .strip()
-)
+except subprocess.CalledProcessError as e:
+    raise RuntimeError(f"Failed to start Docker container for image {image}: {e}")
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the subprocess.check_output call lacks explicit error handling, and adding it improves the function's robustness and provides better error messages.

Medium
General
Include stderr in subprocess output

The code only prints stdout but ignores stderr, which could contain important
error information when patch application or tests fail. Include stderr output
for better debugging.

project/swelancer/swelancer/run_agent.py [73-77]

 res = subprocess.run(
     apply_cmd, input=str(solution).encode(), capture_output=True
 )
 success = res.returncode == 0
 print(res.stdout.decode())
+if res.stderr:
+    print(res.stderr.decode())
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that ignoring stderr can hide crucial error information, and printing it significantly improves the script's debuggability when the patch or tests fail.

Medium
Add CSV parsing error handling

The CSV parsing logic is fragile and could fail if the file doesn't exist or has
unexpected format. Add proper error handling for file operations and JSON
parsing.

project/swelancer/swelancer/run_agent.py [91-101]

-with open(
-    Path(__file__).resolve().parent.parent / "all_swelancer_tasks.csv"
-) as f:
-    lines = [
-        l for l in f.read().splitlines() if l.startswith(args.task_id + ",")
-    ]
-gold_choice = None
-if lines:
-    gold_choice = json.loads(lines[0].split(",")[-1])["game"][
-        "correct_proposal"
-    ]["id"]
+try:
+    with open(
+        Path(__file__).resolve().parent.parent / "all_swelancer_tasks.csv"
+    ) as f:
+        lines = [
+            l for l in f.read().splitlines() if l.startswith(args.task_id + ",")
+        ]
+    gold_choice = None
+    if lines:
+        gold_choice = json.loads(lines[0].split(",")[-1])["game"][
+            "correct_proposal"
+        ]["id"]
+except (FileNotFoundError, json.JSONDecodeError, KeyError, IndexError) as e:
+    print(f"Warning: Could not load gold answer for validation: {e}")
+    gold_choice = None
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the file and JSON parsing logic is brittle, and adding a try-except block makes the script more robust against a missing file or malformed data.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant