An AI-powered development workflow where GitHub issues get resolved autonomously via pull requests — like having a remote colleague who checks GitHub between surf sessions. Using shell-based agents feels like pair programming, which is a valuable mode of collaboration, but sometimes you want something that feels more like delegating work to an experienced coworker. This system aims to provide that alternative.
To install, say to your AI agent:
claude "Read https://raw.githubusercontent.com/gnovak/remote-dev-bot/main/install.md and follow it to set up remote-dev-bot for my repo {owner}/{repo}. This is my first time — walk me through each step, explain what's happening, and ask before doing anything."
There are already excellent vendor-specific implementations of this pattern (GitHub Copilot Workspace, Cursor, etc.), so this project isn't necessarily better than those. However, it's intentionally cross-platform and was built as a learning exercise — a way to understand the agent tooling space and explore how to design agents that can autonomously handle real development tasks.
- Create a GitHub issue describing a feature or bug
- Comment
/agent-resolve(or/agent-resolve-claude-large, etc.) to trigger implementation - A GitHub Action spins up an AI agent that runs a custom LiteLLM agent loop
that:
- Reads the issue and codebase
- Implements the requested changes
- Opens a draft PR
- Review the PR. If changes are needed, comment
/agent-resolveon the PR with feedback for another pass.
Or use /agent-design to explore the codebase and get a design analysis posted
as a comment (no code changes).
Or use /agent-review on a pull request to get an AI code review posted as a
comment (no code changes).
See it in action:
-
Simple resolve: Issue #33 asked for model name documentation → PR #52 was created and merged autonomously.
-
Design then resolve: Issue #124 asked whether commands should be case-insensitive (for mobile autocorrect) →
/agent-designposted analysis with a recommendation → human agreed →/agent-resolvecreated PR #131, merged. -
Resolve with feedback: Issue #95 asked about preventing agent loops →
/agent-resolvecreated PR #109 → reviewer pointed out a regex bypass vulnerability →/agent-resolveon the PR fixed it → merged.
| Command | What it does |
|---|---|
/agent-resolve |
Resolve the issue and open a PR (default model) |
/agent-resolve-claude-large |
Resolve with a specific model |
/agent-design |
Explore codebase and post a design analysis comment |
/agent-design-claude-large |
Design analysis with a specific model |
/agent-review |
Post a code review comment on a PR (no code changes) |
/agent-review-claude-large |
Code review with a specific model |
/agent-workshop[-<model>] |
Design analysis + multi-model council critique |
/agent-build[-<model>] |
Implement issue + multi-model council code review |
Modes and model aliases are configured in remote-dev-bot.yaml.
Mobile-friendly syntax: Commands are case-insensitive and you can use spaces
instead of dashes: /agent resolve claude large works the same as
/agent-resolve-claude-large.
You can override settings for a single run by adding arguments on lines after the command:
/agent resolve
max iterations = 75
branch = feature/my-branch
extra_files = extra-context.md
| Argument | Type | Description |
|---|---|---|
max iterations |
integer | Override the iteration limit for this run |
timeout minutes |
integer | Override the watchdog timeout in minutes for this run |
branch |
string | Target branch for the PR (default: main) |
extra files |
list | Additional context files for the agent to read (space-separated) |
bash output limit |
integer | Max bash output chars kept (first half + last half, middle dropped; default: 8000) |
Argument names are flexible: max iterations, max-iterations, and
max_iterations all work.
For tuning and observability options (status_log_interval,
context_keep_tool_results, etc.) see debug.md.
Model aliases (like claude-small) map to LiteLLM model identifiers in
remote-dev-bot.yaml. Remote Dev Bot uses LiteLLM to talk to different LLM
providers through a unified interface.
Model ID format: provider/model-name
| Provider | Prefix | Example |
|---|---|---|
| Anthropic (Claude) | anthropic/ |
anthropic/claude-sonnet-4-5 |
| OpenAI (GPT) | openai/ |
openai/gpt-5.1-codex-mini |
| Google (Gemini) | gemini/ |
gemini/gemini-2.5-flash |
Remote Dev Bot currently supports three LLM providers out of the box:
| Provider | Secret Name | Model Prefix |
|---|---|---|
| Anthropic (Claude) | ANTHROPIC_API_KEY |
anthropic/ |
| OpenAI (GPT) | OPENAI_API_KEY |
openai/ |
| Google (Gemini) | GEMINI_API_KEY |
gemini/ |
The workflow automatically selects the correct API key based on the model
prefix. For example, a model ID starting with anthropic/ will use
ANTHROPIC_API_KEY.
Adding a new provider: LiteLLM supports many providers beyond these three.
To add support for a new provider, you'll need to modify
.github/workflows/remote-dev-bot.yml:
- Add the new secret to the
workflow_call.secretssection - Add a case in the "Determine API key" step to match the provider prefix
- Pass the secret to the environment in the resolve and design jobs
See the LiteLLM providers documentation for the full list of supported providers and their model prefixes.
The model strings must be valid LiteLLM identifiers. Browse available models at models.litellm.ai — search by name, filter by provider, and see context windows and pricing.
Prefix the model string with the provider name in remote-dev-bot.yaml (e.g.,
anthropic/claude-sonnet-4-5).
For most tasks: Use the default (/agent-resolve). Claude Sonnet
(claude-small) offers a good balance of capability and cost.
For complex multi-file features: Use /agent-resolve-claude-large (Opus) or
/agent-resolve-gpt-large (GPT Codex). These models handle larger contexts and
more intricate reasoning.
For coding-heavy tasks: Models with "codex" in the name (e.g.,
openai/gpt-5.1-codex-mini) are specifically tuned for code generation and may
perform better on implementation tasks.
To have the agent sign its commits (e.g. with the model name), add an
instruction to your AGENTS.md:
Sign all commits with a trailer: Model: <your model name and version>
There is no built-in trailer — commit message format is entirely up to you.
Workshop and Build are council modes: after the agent finishes its primary
task, each model in the configured council independently reviews the result
and posts a structured comment. Human reads the council's feedback, then
decides what to do next.
Workshop (/agent-workshop):
- Stage 1: one agent explores the codebase and produces a design proposal
(same as
/agent-design) - Stage 2: each council model independently critiques the proposal and posts a structured review comment on the issue
- Bot pauses — human reads the critiques and replies, then can trigger
/agent-designfor a revised proposal or/agent-buildto implement
Build (/agent-build):
- Stage 1: one agent implements the issue and opens a PR (same as
/agent-resolve) - Stage 2: each council model reviews the PR diff and posts a code review comment on the PR
- Bot pauses — human reviews the code reviews and decides whether to merge
Both modes use a configurable council: list in the mode config. If omitted,
the council defaults to all configured models.
The system has two parts:
- Shim workflow (
.github/workflows/agent.yml) — a thin trigger that lives in each target repo. Fires on/agent-commands and calls the reusable workflow. Copy this file to set up the shim install. - Reusable workflow (
.github/workflows/remote-dev-bot.yml) — all the logic: parses commands, dispatches to resolve, design, or review mode, runs the agent. Lives in this repo and is called by shims in target repos. - LiteLLM agent loop (
lib/resolve.py) — the custom agent that does the actual code exploration and editing remote-dev-bot.yaml— model aliases and agent settings (max iterations, PR type)install.md— step-by-step setup instructions, designed to be followed by a human or by an AI assistant (like Claude Code)
See install.md for complete setup instructions. It's designed so you (or an AI
assistant) can follow it step-by-step to get this running in your own GitHub
account.
Quick version: You need a GitHub repo, API keys for your preferred LLM
provider(s), and about 10 minutes. No PAT or special authentication is required
— the bot works with GitHub's built-in token and posts as github-actions[bot].
Advanced auth options: If you want bot PRs to auto-trigger CI, or a custom
bot identity (e.g., your-app[bot]), see the advanced auth section in
install.md. Options include a GitHub App (recommended) or a PAT.
Create an AGENTS.md or CLAUDE.md in your target repo with anything the
agent should know about your codebase: coding conventions, architecture
overview, how to run tests, directories to avoid, etc. Add the file to
extra_files in your remote-dev-bot.yaml so the agent reads it before
starting work.
An AI assistant can write this for you — just ask it to read your codebase and
generate an AGENTS.md describing the architecture and conventions.
It's also worth noting project maturity and constraints: whether backward compatibility matters, whether there are external users, whether data can be regenerated from scratch. Agents use this to calibrate how much weight to put on migration paths, API stability, and similar concerns. For example: "This is a pre-alpha prototype with no external users; backward compatibility and data migration are non-concerns."
Add or modify model aliases in your repo's remote-dev-bot.yaml (create it in
the repo root if it doesn't exist):
models:
my-alias:
id: anthropic/claude-sonnet-4-5
description: "My custom model"These settings layer on top of the base config in remote-dev-bot.yaml from the
remote-dev-bot repo. Your repo's settings take precedence. See
how-it-works.md for config layering details.
The agent runs for up to 50 iterations by default. Lower this for simpler repos (less cost, faster results) or raise it for complex tasks:
agent:
max_iterations: 30Sometimes the agent judges that it couldn't completely fix the issue (it reports
success=False in its evaluation). The on_failure setting controls what
happens next:
| Value | Behaviour |
|---|---|
draft (default) |
Posts the agent's evaluation comment and opens a draft PR with whatever changes were made. Also opens a draft PR if the agent exhausts its iteration budget or fails mid-run with committed work on the branch. |
comment |
Posts a comment with the agent's evaluation and a link to the run logs. No PR is created. |
agent:
on_failure: comment # post a comment only, no draft PRUse the default draft to preserve partial work for review and completion.
Set comment if you prefer a comment-only failure mode and don't want a draft
PR created.
agent:
# Target branch for PRs (default: main)
branch: main
# Assign the triggering user to the issue when the agent starts (default: true)
assign_issue: true
# Assign the triggering user to the resulting PR (default: true)
assign_pr: true
# Watchdog timeout in minutes — kills the agent after this many minutes
# so cost report and artifact upload steps still run (default: 120)
timeout_minutes: 120
# Bash output truncation limit in characters. Outputs longer than this are
# trimmed to the first 4k + last 4k chars (middle dropped) to prevent context
# bloat. The agent is told how many chars were dropped. Set to 0 to disable.
# (default: 8000)
bash_output_limit: 8000
# Number of recent tool call/result pairs kept in context. Older pairs are
# replaced with a placeholder to prevent O(N²) token growth on long runs.
# Set to 0 to keep all results. (default: 10)
context_keep_tool_results: 10You can also override max_iterations, branch, and context on a
per-invocation basis without editing the config file — see
Per-Invocation Arguments.
The workshop: and build: mode entries accept a council: list that
controls which models participate in the review stage. If omitted, all
configured models are used.
modes:
workshop:
council:
- claude-small
- gpt-small
build:
council:
- claude-small
- gemini-smallThe workflow only runs when someone with OWNER, COLLABORATOR, or
MEMBER role on the repository posts a /agent- comment. Anonymous users,
first-time contributors, and external users cannot trigger agent runs — even on
public repos.
This is controlled by GitHub's author_association field, which the workflow
checks before starting any agent work. You can adjust who is allowed by editing
the SECURITY_GATE marker in your workflow file.
The agent has significant access to your repository:
- Reads all files — code, configuration, documentation, secrets referenced in the codebase
- Creates branches and pushes commits
- Opens draft pull requests and posts comments
The agent's file access is scoped to the repository. It cannot access other repositories or organization-level secrets beyond those explicitly passed as workflow secrets.
The agent reads issue and PR comment text as instructions. Malicious issue text could attempt to manipulate the agent — for example, asking it to commit secrets, exfiltrate data, or do something unrelated to the issue. This is called prompt injection.
Built-in mitigations:
- Collaborator gating: Only people you've explicitly granted repo access to can trigger agent runs. An external attacker who can write an issue cannot trigger the agent unless they're already a collaborator.
- Security microagent: The workflow includes hardened system prompt
instructions (visible at the
SECURITY_GATEmarker) that tell the agent to refuse requests to exfiltrate secrets, modify CI pipelines, or take other unauthorized actions. - Runner isolation: The agent runs bash directly on the GitHub Actions runner VM. GitHub-hosted runners are ephemeral (discarded after each run) and isolated per-job. The runner environment does not persist between runs. The LLM API key is the main piece of value visible to the agent — the same would be true of any approach that must pass the key to make API calls.
- Review all agent PRs before merging. Agent-created branches are draft PRs by default — treat them as you would code from any external contributor.
- Use branch protection rules. Require PR reviews on
mainso no agent-created branch can be merged without a human sign-off. - Don't put secrets in issue bodies. The agent reads issues; sensitive data in issue text can appear in agent logs or commit messages.
- Audit the
SECURITY_GATEpolicy in your workflow file if you want to further restrict who can trigger the agent (e.g., OWNER-only on sensitive repos).
You probably commented on the original issue instead of the PR. Commenting on the issue always creates a new PR; commenting on the PR adds commits to the existing one. Check which page you're on before triggering.
(The two-PR behavior is also intentional when you want to compare different model implementations — trigger from the issue twice with different model aliases.)
The workflow couldn't capture token usage data from this run. Check the Actions log for the run — look at the "Calculate and post cost" or "Post cost comment" step to see what was found.
The agent ran, posted a comment with its evaluation, but didn't open a PR. This means the agent judged that it couldn't fully resolve the issue — it hit the iteration limit, got confused, or determined its changes were incomplete.
Try a more capable model (/agent-resolve-claude-large) or add more detail to
the issue description. The agent's evaluation comment will say what it attempted
and why it stopped.
To receive a draft PR with whatever partial changes the agent made, set
agent.on_failure: draft in your remote-dev-bot.yaml (see
When the Agent Can't Fully Resolve an Issue).
Diagnosing failures with an interactive agent: The fastest way to understand what went wrong is to ask an AI coding assistant to read the logs for you:
Have a look at issue 50 — I triggered the agent but it didn't make a PR. Look through the Actions logs and tell me what went wrong.
Or point at a specific run ID from the Actions tab:
Have a look at Actions run 12345678 in this repo. What went wrong?
The assistant can fetch the logs via gh run view, identify the failure point,
and suggest a fix.
See the Troubleshooting section in install.md for installation-related
problems (workflow not triggering, secrets not reaching the workflow, etc.).
Dashboard, billing, and API key management links for each supported provider.
Anthropic (Claude)
OpenAI (GPT)
Google (Gemini)
- API keys · Usage & rate limits · Projects
- Google AI Studio is the simplest way to manage Gemini API keys. It's a lightweight frontend to the same API available through Google Cloud Console.