A multi-agent automated software development system using Claude CLI.
uvx --from git+https://github.com/benthomasson/ftl-sdlc-loop ftl-sdlc-loop --helpEvery human developer now uses multiple AI agents: code completion, code review, test generation, documentation, refactoring, debugging. A single developer might have 5-10 agents touching their code daily. Multiply that across a team and the ratio becomes clear — there are already more agents using your code than humans.
This isn't a future prediction. It's the present. And the gap is only widening. Every new AI-powered dev tool, every IDE integration, every CI/CD agent adds another non-human consumer of your software. The codebase that gets read, executed, and interpreted by 3 humans and 30 agents should be designed for its actual audience.
If agents are your primary users, then:
- Error messages should be machine-parseable, not just human-readable
- APIs should be predictable and self-describing, not clever
- Documentation should be structured data, not prose narratives
- Type hints aren't optional — they're how agents understand your code
- Conventions matter more than creativity — agents rely on patterns
Traditional software development tries to imagine what users want through requirements, user stories, and post-deployment surveys. But when your user is an agent, you can put it directly in the development loop and get immediate, structured, actionable feedback.
The User agent in this system isn't simulating user stories — it actually runs the code, hits errors, and provides real UX feedback. This isn't role-playing. Claude is the intended consumer of the software being built.
When the User agent runs python is_prime.py and hits an EOFError, that's not a test case someone wrote. Claude actually tried to use the software and failed. When it reports "the demo crashed in non-interactive mode," that's genuine feedback from real usage by a real consumer.
| Traditional | Agent-First |
|---|---|
| Hypothetical users | Actual user in the loop |
| Delayed feedback | Real-time feedback |
| Interpreted requirements | Direct requirements |
| "Users might want..." | "I need..." |
| 3 humans read your code | 30 agents read your code |
At the end of each stage, every agent asks: "What would make my job easier?"
This surfaces friction immediately:
- Planner: "The task description was ambiguous about error handling"
- Implementer: "The plan didn't specify the expected input format"
- Reviewer: "No type hints made the code harder to review"
- Tester: "Missing edge case documentation"
- User: "The error message didn't tell me what to do next"
Those feature requests go back to the Planner, who decides which are worth implementing. The loop continues until the User is satisfied.
Agents are particularly well-suited as users because:
- They articulate frustration — Unlike humans who abandon software silently, agents explain exactly what went wrong and why
- They follow instructions literally — If documentation is unclear, they fail in instructive ways
- They provide structured feedback — Prioritized feature requests, not vague complaints
- They're always available — No user research scheduling, no interview bias
- They represent the majority — Designing for agents means designing for your actual user base
Before development begins, humans and AI build shared understanding together.
This isn't just writing a spec—it's collaborative intelligence:
- AI analyzes the task and identifies gaps
- AI asks clarifying questions
- Human provides answers
- AI validates understanding and checks for contradictions
- Both parties build genuine understanding
# Interactive shared understanding session
uv run understand.py "build a REST API for user management"
# With context from external sources
uv run understand.py "fix the login bug" --context JIRA-123.md slack-thread.txtThis creates SHARED_UNDERSTANDING.md which all agents reference.
See: shared-understanding framework
Every stage commits to git, providing:
- Checkpoints - Recovery points if something goes wrong
- Audit trail - Full history of what each agent did and why
- Visibility - Human can review commits at any time
- Async review - Artifacts persist for later inspection
[supervisor] Start task: ...
[planner] Plan for: ...
[implementer] Implement: fibonacci.py
[reviewer] Code review complete
[tester] Tests and usage documentation
[user] User feedback and feature requests
[supervisor] Iteration 1 complete
Every agent reflects after completing their work:
- What went well?
- What information was missing?
- What would make my job easier next time?
- Confidence level / concerns for next stage
This surfaces friction points and improvement ideas throughout the pipeline, not just at the end.
┌─────────────────────────────────────────────────────────────────────┐
│ SUPERVISOR │
│ (orchestrates iterations) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ PLANNER (PM + Architect) │
│ • Decides WHAT and WHY │
│ • Suggests HOW (but implementer has final say) │
│ • Receives feature requests from User │
│ • Self-review: What would make planning easier? │
│ → [git commit: PLAN.md] │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ IMPLEMENTER │
│ • Has ULTIMATE CONTROL of HOW │
│ • Can push back on planner if approach won't work │
│ • Writes code with clear error messages │
│ • Self-review: What was unclear in the plan? │
│ → [git commit: implementation files] │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ REVIEWER │
│ • Feedback to implementer (correctness, errors, usability) │
│ • Feed-forward to tester (what to test, edge cases) │
│ • Verdict: APPROVED or NEEDS_CHANGES │
│ • Self-review: What made review difficult? │
│ → [git commit: REVIEW.md] │
│ ↑ (if NEEDS_CHANGES: loops back to IMPLEMENTER, up to 3 attempts) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ TESTER │
│ • Creates tests based on reviewer notes │
│ • Documents HOW TO USE the software │
│ • Provides usage instructions to User │
│ • Verdict: TESTS_PASSED or TESTS_FAILED │
│ • Self-review: What gaps did testing reveal? │
│ → [git commit: tests + USAGE.md] │
│ ↑ (if TESTS_FAILED: loops back to IMPLEMENTER, up to 3 attempts) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ USER (Claude) │
│ • Actually RUNS the code following tester instructions │
│ • Reports what worked, what failed, what was confusing │
│ • Requests features: "What would make my job easier?" │
│ • Verdict: SATISFIED or NEEDS_IMPROVEMENT │
│ → [git commit: USER_FEEDBACK.md] │
└─────────────────────────────────────────────────────────────────────┘
│
│ (if NEEDS_IMPROVEMENT)
▼
[loops back to PLANNER]
# Run from any directory — workspaces are created in cwd
mkdir my-project && cd my-project
# Run a task
ftl-sdlc-loop --workspace fibonacci "write a function to calculate fibonacci numbers"
# Fast mode for simple tasks
ftl-sdlc-loop --workspace two-sum --effort minimal --no-questions "solve the two-sum problem"Control thoroughness vs speed:
# Fast (~2-5 min): 3 agents, skip review, basic tests
ftl-sdlc-loop --effort minimal "solve two-sum"
# Balanced (~30-60 min): 4 agents, code review, decent tests (default)
ftl-sdlc-loop --effort moderate "build a REST API"
# Production (~2-3 hours): full 5-agent pipeline, comprehensive testing
ftl-sdlc-loop --effort maximum "implement authentication"# Fully automated — no interactive prompts
ftl-sdlc-loop --workspace my-task --effort minimal --no-questions "solve the problem"The --no-questions flag auto-responds to all agent escalations, ensuring the system never blocks waiting for input. Combined with --effort minimal, this enables fully unattended batch processing.
# Clone a repo into a workspace
ftl-sdlc-loop --workspace iris --init-from git@github.com:user/iris.git
# Work on it
ftl-sdlc-loop --workspace iris "add a new feature"
ftl-sdlc-loop --workspace iris --continue "fix the bug from last run"
# Push changes back (artifacts archived to logs/)
ftl-sdlc-loop --workspace iris --push # Push directly
ftl-sdlc-loop --workspace iris --pr # Or create a PRWorkspaces are created in workspaces/{name}/ relative to where you run the command.
When pushing, artifact files (PLAN.md, REVIEW.md, etc.) are archived to logs/{workspace}_{timestamp}_artifacts.tar.gz before being removed from the repo.
Load secrets and configuration for agents:
# Load .env file into workspace and environment
ftl-sdlc-loop --workspace myproject --env ~/.secrets/myproject.env "build API integration"
# Can combine with --init-from
ftl-sdlc-loop --workspace iris --init-from ~/git/iris --env ~/iris.env "add OAuth support"The .env file is:
- Copied to the workspace root
- Automatically added to
.gitignore(prevents committing secrets) - Parsed and loaded into the environment for agents to inherit
Supports standard .env format: comments (#), empty lines, export prefix, and quoted values.
For bigger projects, review the plan before implementation:
# Step 1: Generate plan only
ftl-sdlc-loop --workspace myproject --plan-only "build a REST API with authentication"
# Step 2: Review and edit the plan
cat workspaces/myproject/PLAN.md
# Edit if needed...
# Step 3: Run with the reviewed plan
ftl-sdlc-loop --workspace myproject --plan workspaces/myproject/PLAN.md "build a REST API with authentication"The --plan-only flag runs only the planner, saves PLAN.md, and exits for human review.
The --plan PATH flag uses an existing plan file and skips the planner stage.
For complex task descriptions, read from a file:
# Create a detailed task file
cat > task.md << 'EOF'
Build a REST API for user management with the following requirements:
- POST /users - Create user (name, email required)
- GET /users/:id - Get user by ID
- PUT /users/:id - Update user
- DELETE /users/:id - Delete user
Use SQLite for storage. Include input validation and proper error responses.
EOF
# Run with the task file
ftl-sdlc-loop --workspace user-api --prompt-file task.mdWork on GitLab issues directly:
# Clone from GitLab directly (glab auto-detects project)
ftl-sdlc-loop --workspace issue-285 \
--init-from git@gitlab.com:org/repo.git \
--gitlab-issue 285 \
--effort minimal
# Or clone from local bare repo with explicit GitLab remote
ftl-sdlc-loop --workspace issue-285 \
--init-from ~/git/repo.git \
--gitlab-remote git@gitlab.com:org/repo.git \
--gitlab-issue 285 \
--effort minimal
# After completion, create merge request
ftl-sdlc-loop --workspace issue-285 --gitlab-mr --pushGitLab flags:
| Flag | Description |
|---|---|
--gitlab-issue NUM |
Fetch issue, assign to self, use as task prompt |
--gitlab-mr |
Create merge request after successful run |
--gitlab-remote URL |
Add GitLab remote (required for bare repo workflows) |
--branch NAME |
Override auto-generated branch name |
The --gitlab-issue flag:
- Fetches issue title and description via
glab issue view - Assigns the issue to your GitLab user
- Uses the issue body as the task prompt (adds
Closes #N) - Auto-generates branch name:
fix/issue-285-description-slug
The --gitlab-mr flag:
- Pushes the work branch to origin
- Creates a merge request with the issue title
- Assigns to current user
- Auto-detects MR template from
.gitlab/merge_request_templates/Default.md - Uses Claude to intelligently fill in template sections
The --gitlab-remote flag:
- Adds a
gitlabremote to the workspace after cloning - Required when cloning from a local bare repo (where
originpoints locally) - Allows
glabto detect the GitLab project
Requires: glab CLI installed and authenticated (glab auth login).
Process tasks from a queue file, running unattended:
echo "write a hello world function" > queue.txt
echo "add error handling" >> queue.txt
ftl-sdlc-loop --continuous --effort minimal --no-questionsBuild context before development:
ftl-sdlc-loop --understanding docs/SHARED_UNDERSTANDING.md "build the feature"cd workspaces/my-task
git log --oneline # Full audit trail
cat FINAL_REPORT.md # Summary
cat implementer/*.py # Generated codesrc/ftl_sdlc_loop/
├── __init__.py # Package metadata
├── supervisor.py # Pipeline orchestrator with feedback loop
├── agent.py # Agent runner utility
└── understand.py # Phase 0: Shared understanding builder
your-project/
├── workspaces/ # Named workspaces (each a git repo)
│ └── {workspace}/
│ ├── .git/
│ ├── .env # Environment variables (via --env, gitignored)
│ ├── TASK.md, PLAN.md, REVIEW.md, USAGE.md
│ ├── FINAL_REPORT.md
│ ├── implementer/*.py # Generated code
│ ├── tester/test_*.py # Generated tests
│ ├── beliefs.md # Claim tracking
│ └── entries/iteration-{N}/*.md # Audit trail
├── agents/ # Agent session directories
├── pids/ # PID files for running agents
├── logs/ # Archived artifacts from --push
└── multiagent.log # Verbose logging
-
Session Isolation: Each agent has its own directory per workspace (
agents/{workspace}/{role}/). Claude CLI stores conversation history per directory, so each agent maintains its own context. -
Git Coordination: Every stage commits artifacts. This provides checkpoints, audit trail, and async review capability.
-
Self-Review: Each agent reflects on their work, surfacing friction points and improvement ideas.
-
Inner Loops: Before reaching the User, two inner feedback loops catch problems early (up to 3 attempts each):
- Reviewer → Implementer: If the reviewer returns
NEEDS_CHANGES, the implementer fixes issues before testing begins. - Tester → Implementer: If tests fail (
TESTS_FAILED), the implementer fixes bugs before the user tries the code.
- Reviewer → Implementer: If the reviewer returns
-
Outer Feedback Loop: User feedback triggers new iterations. The Planner reviews feature requests and decides which to implement.
-
Structured Verdicts & Exit Gates: Agents emit machine-parseable verdict blocks (
STATUS:+OPEN_ISSUES:) instead of prose. An exit gate catches contradictions — if an agent declares APPROVED but lists open issues, the supervisor overrides the verdict or escalates to a human. This prevents cascading belief failures where bugs propagate through the pipeline because a positive keyword appeared in otherwise-negative feedback. -
Convergence: Loop ends when User is SATISFIED or max iterations reached.
The pipeline integrates beliefs as a library for tracking claims and contradictions across multi-agent systems.
In early test runs, the User agent declared SATISFIED despite documented bugs — the string-matching verdict parser couldn't distinguish "satisfied despite issues" from "satisfied, no issues." Bugs propagated through the pipeline unchallenged because each agent trusted the previous agent's positive verdict without checking the evidence. This is a cascading belief failure: one wrong claim ("code is correct") becomes an unquestioned assumption for every downstream agent.
When the beliefs CLI is installed, the supervisor registers claims at each pipeline stage:
| After stage | Claim type | Example |
|---|---|---|
| Planner | AXIOM |
"Implementation should use recursive approach" |
| Implementer | DERIVED |
"Created is_prime.py" |
| Reviewer | WARNING |
"is_prime(4.9) returns True" |
| Tester | OBSERVATION |
"Tests PASSED" |
Before the User stage, beliefs compact produces a structured summary of the current belief state — replacing raw prose accumulation with a queryable snapshot. The exit gate then checks: if the User declares SATISFIED but the beliefs system has active WARNINGs, it escalates to a human rather than terminating the loop.
The beliefs library is a required dependency — it's declared in supervisor.py's inline script metadata and installed automatically by uv run.
| File | Created By | Purpose |
|---|---|---|
INITIAL_ANALYSIS.md |
Understand | Initial task analysis and questions |
VALIDATION.md |
Understand | Human answers and validation |
SHARED_UNDERSTANDING.md |
Understand | Final shared understanding document |
TASK.md |
Supervisor | Original task description |
PLAN.md |
Planner | Requirements, design decisions, success criteria |
IMPLEMENTATION.md |
Implementer | Implementation notes and self-review |
*.py |
Implementer | Generated code files |
REVIEW.md |
Reviewer | Code review with feedback and feed-forward |
USAGE.md |
Tester | Usage instructions for the User |
test_*.py |
Tester | Test files |
USER_FEEDBACK.md |
User | Usage report and feature requests |
entries/iteration-N/*.md |
Supervisor | Full agent outputs per iteration (audit trail) |
beliefs.md |
Supervisor | Belief registry for claim tracking |
ITERATION_N_SUMMARY.md |
Supervisor | Per-iteration summary |
FINAL_SUMMARY.md |
Supervisor | Final status and history |
# Install as a CLI tool
uv tool install git+https://github.com/benthomasson/ftl-sdlc-loop
# Then use it anywhere
ftl-sdlc-loop "write a function to calculate fibonacci numbers"
ftl-sdlc-loop --workspace myproject --init-from /path/to/repoOr run directly without installing:
# Clone and run with uv
git clone https://github.com/benthomasson/ftl-sdlc-loop
cd ftl-sdlc-loop
uv run supervisor.py "your task here"- uv - Python package manager
- Claude CLI -
claudecommand available in PATH - beliefs - Installed automatically as a dependency