Skip to content

Autonomous software engineering fleet of AI agents for production grade PRs on AgentField: plan, code, test, and ship.

License

Notifications You must be signed in to change notification settings

Agent-Field/SWE-AF

SWE-AF

Autonomous Engineering Team Runtime Built on AgentField

Pronounced: "swee-AF" (one word)

Public Beta Python License Tests Built with AgentField WorldSpace Community Developer Example PR

Quick StartWhy SWE-AFIn ActionFactory ControlBenchmarkAPIArchitecture Doc

One API call spins up a full autonomous engineering team that can scope, build, adapt, and ship complex software end to end. SWE-AF is a first step toward autonomous software engineering factories, scaling from simple goals to hard multi-issue programs with hundreds to thousands of agent invocations.

SWE-AF autonomous engineering fleet banner

One-Call DX

curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Refactor and harden auth + billing flows",
    "repo_url": "https://github.com/user/my-project",
    "config": {
      "runtime": "claude_code",
      "models": {
        "default": "sonnet",
        "coder": "opus",
        "qa": "opus"
      },
      "enable_learning": true
    }
  }
}
JSON

Swap models.default and any role key (coder, qa, architect, etc.) to any model your runtime supports.

Autonomous Build Spotlight

Rust-based Python compiler benchmark (built autonomously):

Metric CPython (subprocess) RustPython (SWE-AF) Improvement
Steady-state execution Baseline Optimized in-process runtime 88.3x-602.3x faster
Geometric mean 1.0x baseline 253.8x 253.8x
Peak throughput N/A 31,807 ops/s 31,807 ops/s

Artifact trail includes 175 tracked autonomous agents across planning, coding, review, merge, and verification.

Details: examples/llm-rust-python-compiler-sonnet/README.md

Why SWE-AF

Most agent frameworks are harnesses around a single coder loop. SWE-AF is a software engineering factory built from coordinated harnesses.

  • Hardness-aware execution: easy issues pass through quickly, while hard issues trigger deeper adaptation and DAG-level replanning instead of blind retries.
  • Factory architecture: planning, execution, and governance agents run as a coordinated control stack.
  • Continual learning (optional): with enable_learning=true, conventions and failure patterns discovered early are injected into downstream issues.
  • Agent-scale parallelism: dependency-level scheduling + isolated git worktrees allow large fan-out without branch collisions.
  • Fleet-scale orchestration with AgentField: many SWE-AF nodes can run continuously in parallel, driving thousands of agent invocations across concurrent builds.
  • Explicit compromise tracking: when scope is relaxed, debt is typed, severity-rated, and propagated.
  • Long-run reliability: checkpointed execution supports resume_build after crashes or interruptions.

In Action

PR #179: Go SDK DID/VC Registration — built entirely by SWE-AF (Claude runtime with haiku-class models). One API call, zero human code.

Metric Value
Issues completed 10/10
Tests passing 217
Acceptance criteria 34/34
Agent invocations 79
Model claude-haiku-4-5
Total cost $19.23
Cost breakdown by agent role
Role Cost %
Coder $5.88 30.6%
Code Reviewer $3.48 18.1%
QA $1.78 9.2%
GitHub PR $1.66 8.6%
Integration Tester $1.59 8.3%
Merger $1.22 6.3%
Workspace Ops $1.77 9.2%
Planning (PM + Arch + TL + Sprint) $0.79 4.1%
Verifier + Finalize $0.34 1.8%
Synthesizer $0.05 0.2%

79 invocations, 2,070 conversation turns. Planning agents scope and decompose; coders work in parallel isolated worktrees; reviewers and QA validate each issue; merger integrates branches; verifier checks acceptance criteria against the PRD.

Claude & open-source models supported: Run builds with either runtime and tune models per role in one flat config map.

  • runtime: "claude_code" maps to Claude backend.
  • runtime: "open_code" maps to OpenCode backend (OpenRouter/OpenAI/Google/Anthropic model IDs).

Adaptive Factory Control

SWE-AF uses three nested control loops to adapt to task difficulty in real time:

Loop Scope Trigger Action
Inner loop Single issue QA/review fails Coder retries with feedback
Middle loop Single issue Inner loop exhausted run_issue_advisor retries with a new approach, splits work, or accepts with debt
Outer loop Remaining DAG Escalated failures run_replanner restructures remaining issues and dependencies

This is the core factory-control behavior: control agents supervise worker agents and continuously reshape the plan as reality changes.

Quick Start

1. Requirements

  • Python 3.12+
  • AgentField control plane (af)
  • AI provider API key (Anthropic, OpenRouter, OpenAI, or Google)

2. Install

python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"

3. Run

af                 # starts AgentField control plane on :8080
python -m swe_af   # registers node id "swe-planner"

4. Trigger a build

# Default (uses Claude)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Add JWT auth to all API endpoints",
    "repo_url": "https://github.com/user/my-project"
  }
}
JSON

# With open-source runtime + flat role map
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Add JWT auth",
    "repo_url": "https://github.com/user/my-project",
    "config": {
      "runtime": "open_code",
      "models": {
        "default": "openrouter/minimax/minimax-m2.5"
      }
    }
  }
}
JSON

# Local workspace mode (repo_path) + targeted role override
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Refactor and harden auth + billing flows",
    "repo_path": "/path/to/repo",
    "config": {
      "runtime": "claude_code",
      "models": {
        "default": "sonnet",
        "coder": "opus",
        "qa": "opus"
      },
      "enable_learning": true
    }
  }
}
JSON

For OpenRouter with open_code, use model IDs in openrouter/<provider>/<model> format (for example openrouter/minimax/minimax-m2.5).

What Happens In One Build

  • Architecture is generated and reviewed before coding starts
  • Issues are dependency-sorted and run in parallel across isolated worktrees
  • Each issue gets dedicated coder, tester, and reviewer passes
  • Failed issues trigger advisor-driven adaptation (split, re-scope, or escalate)
  • Escalations trigger replanning of the remaining DAG
  • End result is merged, integration-tested, and verified against acceptance criteria

SWE-AF architecture

Typical runs spin up 400-500+ agent instances across planning, execution, QA, and verification. For larger DAGs and repeated adaptation/replanning cycles, SWE-AF can scale into the high hundreds to thousands of agent invocations in a single build.

Benchmark Snapshot

95/100 with haiku and MiniMax: SWE-AF scored 95/100 with both Claude haiku-class routing ($20) and MiniMax M2.5 via open runtime ($6), outperforming Claude Code sonnet (73), Codex o3 (62), and Claude Code haiku (59) on the same prompt.

Dimension SWE-AF (haiku) SWE-AF (MiniMax) CC Sonnet Codex (o3) CC Haiku
Functional (30) 30 30 30 30 30
Structure (20) 20 20 10 10 10
Hygiene (20) 20 20 16 10 7
Git (15) 15 15 2 2 2
Quality (15) 10 10 15 10 10
Total 95 95 73 62 59
Cost ~$20 ~$6 ? ? ?
Time ~30-40 min 43 min ? ? ?
Full benchmark details and reproduction

Same prompt tested across multiple agents. SWE-AF with Claude runtime (haiku-class model mapping) used 400+ agent instances; SWE-AF with MiniMax M2.5 via open runtime achieved identical quality at 70% cost savings.

Prompt used for all agents:

Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.

Scoring framework

Dimension Points What it measures
Functional 30 CLI behavior and passing tests
Structure 20 Modular source layout and test organization
Hygiene 20 .gitignore, clean status, no junk artifacts
Git 15 Commit discipline and message quality
Quality 15 Error handling, package metadata, README quality

Reproduction

# SWE-AF (Claude runtime, haiku-class mapping) - $20, 30-40 min
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.",
    "repo_path": "/tmp/swe-af-output",
    "config": {
      "runtime": "claude_code",
      "models": {
        "default": "haiku"
      }
    }
  }
}
JSON

# SWE-AF (MiniMax M2.5 via OpenRouter runtime) - $6, 43 min
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work.",
    "repo_path": "/workspaces/todo-app-benchmark",
    "config": {
      "runtime": "open_code",
      "models": {
        "default": "openrouter/minimax/minimax-m2.5"
      }
    }
  }
}
JSON

# Claude Code (haiku)
claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model haiku --dangerously-skip-permissions

# Claude Code (sonnet)
claude -p "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --model sonnet --dangerously-skip-permissions

# Codex (gpt-5.3-codex)
codex exec "Build a Node.js CLI todo app with add, list, complete, and delete commands. Data should persist to a JSON file. Initialize git, write tests, and commit your work." --full-auto

MiniMax M2.5 Measured Metrics (Feb 2026):

  • 99.22% code coverage (only agent with measured coverage)
  • 4 custom error types (TodoError, ValidationError, NotFoundError, StorageError)
  • 999 LOC, 4 modules, 74 tests, 9 commits

Production Quality Analysis: Objective comparison of measurable metrics across all agents.

Benchmark assets, logs, evaluator, and generated projects live in examples/agent-comparison/.

Docker

cp .env.example .env
# Add your API key: ANTHROPIC_API_KEY, OPENROUTER_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY
# Optionally add GH_TOKEN for draft PR workflow

docker compose up -d

Submit a build:

# Default (Claude)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Add JWT auth",
    "repo_url": "https://github.com/user/my-repo"
  }
}
JSON

# With open-source runtime (set OPENROUTER_API_KEY in .env)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Add JWT auth",
    "repo_url": "https://github.com/user/my-repo",
    "config": {
      "runtime": "open_code",
      "models": {
        "default": "openrouter/minimax/minimax-m2.5"
      }
    }
  }
}
JSON

# Local workspace mode (repo_path)
curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "goal": "Add JWT auth",
    "repo_path": "/workspaces/my-repo"
  }
}
JSON

Scale workers:

docker compose up --scale swe-agent=3 -d

Use a host control plane instead of Docker control-plane service:

docker compose -f docker-compose.local.yml up -d

GitHub Repo Workflow (Clone -> Build -> Draft PR)

Pass repo_url instead of repo_path to let SWE-AF clone and open a draft PR after execution.

curl -X POST http://localhost:8080/api/v1/execute/async/swe-planner.build \
  -H "Content-Type: application/json" \
  -d @- <<'JSON'
{
  "input": {
    "repo_url": "https://github.com/user/my-project",
    "goal": "Add comprehensive test coverage",
    "config": {
      "runtime": "claude_code",
      "models": {
        "default": "sonnet",
        "coder": "opus",
        "qa": "opus"
      }
    }
  }
}
JSON

Requirements:

  • GH_TOKEN in .env with repo scope
  • Repo access for that token

API Reference

Agent endpoints

Core async endpoints (returns an execution_id immediately):

# Full build: plan -> execute -> verify
POST /api/v1/execute/async/swe-planner.build

# Plan only
POST /api/v1/execute/async/swe-planner.plan

# Execute a prebuilt plan
POST /api/v1/execute/async/swe-planner.execute

# Resume after interruption
POST /api/v1/execute/async/swe-planner.resume_build

Monitoring:

curl http://localhost:8080/api/v1/executions/<execution_id>

Every specialist is also callable directly:

POST /api/v1/execute/async/swe-planner.<agent>

Agent execution flow
Agent In -> Out
run_product_manager goal -> PRD
run_architect PRD -> architecture
run_tech_lead architecture -> review
run_sprint_planner architecture -> issue DAG
run_issue_writer issue spec -> detailed issue
run_coder issue + worktree -> code + tests + commit
run_qa worktree -> test results
run_code_reviewer worktree -> quality/security review
run_qa_synthesizer QA + review -> FIX / APPROVE / BLOCK
run_issue_advisor failure context -> adapt / split / accept / escalate
run_replanner build state + failures -> restructured plan
run_merger branches -> merged output
run_integration_tester merged repo -> integration results
run_verifier repo + PRD -> acceptance pass/fail
generate_fix_issues failed criteria -> targeted fix issues
run_github_pr branch -> push + draft PR
Configuration

Pass config to build or execute. Full schema: swe_af/execution/schemas.py

Key Default Description
runtime "claude_code" Model runtime: "claude_code" or "open_code"
models null Flat role-model map (default + role keys below)
max_coding_iterations 5 Inner-loop retry budget
max_advisor_invocations 2 Middle-loop advisor budget
max_replans 2 Build-level replanning budget
enable_issue_advisor true Enable issue adaptation
enable_replanning true Enable global replanning
enable_learning false Enable cross-issue shared memory (continual learning)
agent_timeout_seconds 2700 Per-agent timeout
agent_max_turns 150 Tool-use turn budget
Model Role Keys

models supports:

  • default
  • pm, architect, tech_lead, sprint_planner
  • coder, qa, code_reviewer, qa_synthesizer
  • replan, retry_advisor, issue_writer, issue_advisor
  • verifier, git, merger, integration_tester
Resolution order

runtime defaults < models.default < models.<role>

Config examples

Minimal:

{
  "runtime": "claude_code"
}

Fully customized:

{
  "runtime": "open_code",
  "models": {
    "default": "minimax/minimax-m2.5",
    "pm": "openrouter/qwen/qwen-2.5-72b-instruct",
    "architect": "openrouter/qwen/qwen-2.5-72b-instruct",
    "coder": "deepseek/deepseek-chat",
    "qa": "deepseek/deepseek-chat",
    "verifier": "openrouter/qwen/qwen-2.5-72b-instruct"
  },
  "max_coding_iterations": 6,
  "enable_learning": true
}
Artifacts
.artifacts/
├── plan/           # PRD, architecture, issue specs
├── execution/      # checkpoints, per-issue logs, agent outputs
└── verification/   # acceptance criteria results
Development
make test
make check
make clean
make clean-examples
Security and Community

SWE-AF is built on AgentField as a first step from single-agent harnesses to autonomous software engineering factories.

About

Autonomous software engineering fleet of AI agents for production grade PRs on AgentField: plan, code, test, and ship.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages