AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works · Install · See It Work · Skills · CLI · FAQ

From goal to shipped code — agents research, plan, and implement in parallel. Councils validate before and after. Every learning feeds the next session.

How It Works

Coding agents get a blank context window every session. AgentOps is a toolbox of primitives — pick the ones you need, skip the ones you don't. Every skill works standalone. Swarm any of them for parallelism. Chain them into a pipeline when you want structure. Knowledge compounds between sessions automatically.

DevOps' Three Ways — applied to the agent loop as composable primitives:

Flow (/research, /plan, /crank, /swarm, /rpi): orchestration skills that move work through the system. Single-piece flow, minimizing context switches. Swarm parallelizes any skill; crank runs dependency-ordered waves; rpi chains the full pipeline.
Feedback (/council, /vibe, /pre-mortem, hooks): shorten the feedback loop until defects can't survive it. Independent judges catch issues before code ships. Hooks make the rules unavoidable — validation gates, push blocking, standards injection. Problems found Friday don't wait until Monday.
Learning (.agents/, ao CLI, /retro, /knowledge): stop rediscovering what you already know. Every session extracts learnings into an append-only ledger, scores them by freshness, and re-injects the best ones at next session start. Session 50 knows what session 1 learned the hard way.

Here's what that looks like — your agent validates a PR, and the council verdict, decisions, and patterns are automatically written to .agents/. Three weeks later, different task, but your agent already knows:

> /research "retry backoff strategies"

[inject] 3 prior learnings loaded (freshness-weighted):
  - Token bucket with Redis (established, high confidence)
  - Rate limit at middleware layer, not per-handler (pattern)
  - /login endpoint was missing rate limiting (decision)
[research] Found prior art in your codebase + injected context
           Recommends: exponential backoff with jitter, reuse existing Redis client

Session 5 didn't start from scratch — it started with what session 1 learned. Stale insights decay automatically.

Local-only — no telemetry, no cloud, no accounts. Nothing phones home. Everything is open source — audit it yourself.
Multi-runtime — Claude Code, Codex CLI, Cursor, OpenCode. Skills are portable across runtimes (/converter exports to native formats).
Multi-model councils — independent judges (Claude + Codex) debate before code ships. Not advisory — validation gates block merges until they pass.

Install

# Claude Code (recommended): marketplace + plugin install
claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

# Codex CLI / Cursor
npx skills@latest add boshu2/agentops --all -g

# OpenCode
curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

Then type /quickstart in your agent chat. Not sure which skill to run? See the Skill Router.

For Claude plugin installs, skills are available immediately after plugin install/update (restart Claude Code if prompted). To enable hooks and flywheel automation, install the ao CLI and run ao init --hooks in each repo.

claude plugin install is the primary path for Claude Code. npx skills remains the cross-agent install path for Codex CLI, Cursor, and mixed-runtime setups.

The ao CLI — powers the knowledge flywheel

Skills work standalone. The ao CLI powers the automated learning loop — knowledge extraction, injection with freshness decay, maturity lifecycle, and progress gates. Install it when you want knowledge to compound between sessions.

brew tap boshu2/agentops https://github.com/boshu2/homebrew-agentops && brew install agentops
cd /path/to/your/repo
ao init --hooks

Update to the latest CLI later with:

brew update && brew upgrade agentops
ao version

This installs 30+ hooks across core lifecycle events:

Event	What happens
SessionStart	Extract from prior session, inject top learnings (freshness-weighted), check progress gates
SessionEnd	Mine transcript for knowledge, record session outcome, expire stale artifacts, evict dead knowledge
PreToolUse	Inject coding standards before edits, gate dangerous git ops, validate before push
PostToolUse	Advance progress ratchets, track citations
TaskCompleted	Validate task output against acceptance criteria
Stop/PreCompact	Close feedback loops, snapshot before compaction

OpenCode — plugin + skills

Installs 7 hooks (tool enrichment, audit logging, compaction resilience) and symlinks all skills. Restart OpenCode after install. Details: .opencode/INSTALL.md

Configuration — environment variables

All optional. AgentOps works out of the box with no configuration.

Council / validation:

Variable	Default	What it does
`COUNCIL_TIMEOUT`	120	Judge timeout in seconds
`COUNCIL_CLAUDE_MODEL`	sonnet	Claude model for judges (`opus` for high-stakes)
`COUNCIL_CODEX_MODEL`	(user's Codex default)	Override Codex model for `--mixed`
`COUNCIL_EXPLORER_MODEL`	sonnet	Model for explorer sub-agents
`COUNCIL_EXPLORER_TIMEOUT`	60	Explorer timeout in seconds
`COUNCIL_R2_TIMEOUT`	90	Debate round 2 timeout in seconds

Hooks:

Variable	Default	What it does
`AGENTOPS_HOOKS_DISABLED`	0	`1` to disable all hooks (kill switch)
`AGENTOPS_PRECOMPACT_DISABLED`	0	`1` to disable pre-compaction snapshot
`AGENTOPS_TASK_VALIDATION_DISABLED`	0	`1` to disable task validation gate
`AGENTOPS_SESSION_START_DISABLED`	0	`1` to disable session-start hook
`AGENTOPS_EVICTION_DISABLED`	0	`1` to disable knowledge eviction
`AGENTOPS_GITIGNORE_AUTO`	1	`0` to skip auto-adding `.agents/` to `.gitignore`
`AGENTOPS_WORKER`	0	`1` to skip push gate (for worker agents)

Full reference with examples and precedence rules: docs/ENV-VARS.md

What	Where	Reversible?
Skills	Global skills dir (outside your repo; for Claude Code: `~/.claude/skills/`)	`npx skills@latest remove boshu2/agentops -g`
Knowledge artifacts	`.agents/` in your repo (git-ignored by default)	`rm -rf .agents/`
Hook registration	`.claude/settings.json`	`ao hooks uninstall` or delete entries
Git push gate	Pre-push hook (optional, only with CLI)	`AGENTOPS_HOOKS_DISABLED=1`

Nothing modifies your source code.

Troubleshooting: docs/troubleshooting.md

See It Work

Use one skill — validate a PR:

> /council validate this PR

[council] 3 judges spawned (independent, no anchoring)
[judge-1] PASS — token bucket implementation correct
[judge-2] WARN — rate limiting missing on /login endpoint
[judge-3] PASS — Redis integration follows middleware pattern
Consensus: WARN — add rate limiting to /login before shipping

Parallelize anything with /swarm:

> /swarm "research auth patterns, brainstorm rate limiting improvements"

[swarm] 3 agents spawned — each gets fresh context
[agent-1] /research auth — found JWT + session patterns, 2 prior learnings
[agent-2] /research rate-limiting — found token bucket, middleware pattern
[agent-3] /brainstorm improvements — 4 approaches ranked
[swarm] Complete — artifacts in .agents/

Full pipeline — one command, walk away:

> /rpi "add retry backoff to rate limiter"

[research]    Found 3 prior learnings on rate limiting (injected)
[plan]        2 issues, 1 wave → epic ag-0058
[pre-mortem]  Council validates plan → PASS (knew about Redis choice)
[crank]       Parallel agents: Wave 1 ██ 2/2
[vibe]        Council validates code → PASS
[post-mortem] 2 new learnings → .agents/
[flywheel]    Next: /rpi "add circuit breaker to external API calls"

AgentOps building AgentOps: completed `/crank` across 3 parallel epics (15 issues, 5 waves, 0 regressions).

More examples — /evolve, session continuity

Session continuity across compaction or restart:

> /handoff
[handoff] Saved: 3 open issues, current branch, next action
         Continuation prompt written to .agents/handoffs/

--- next session ---

> /recover
[recover] Found in-progress epic ag-0058 (2/5 issues closed)
          Branch: feature/rate-limiter
          Next: /implement ag-0058.3

Goal-driven improvement loop:

> /evolve --max-cycles=5

[evolve] GOALS.yaml: 4 goals loaded
[cycle-1] Measuring fitness... 2/4 passing
         Worst gap: test-pass-rate (weight: 10)
         /rpi "Improve test-pass-rate" → 3 issues, 2 waves
         Re-measure: 3/4 passing ✓
[cycle-2] Worst gap: doc-coverage (weight: 7)
         /rpi "Improve doc-coverage" → 2 issues, 1 wave
         Re-measure: 4/4 passing ✓
[cycle-3] All goals met. Checking harvested work...
         Picked: "add smoke test for /evolve" (from post-mortem)
[teardown] /post-mortem → 5 learnings extracted

Different developers, different setups — use what fits your workflow

The PR reviewer — uses one skill, nothing else:

> /council validate this PR
Consensus: WARN — missing error handling in 2 locations

That's it. No pipeline, no setup, no commitment. One command, actionable feedback.

The team lead — composes skills manually:

> /research "performance bottlenecks in the API layer"
> /plan "optimize database queries identified in research"
> /council validate the plan

Picks skills as needed, stays in control of sequencing.

The solo dev — runs the full pipeline, walks away:

> /rpi "add user authentication"
[3 phases run autonomously, learnings extracted]

One command does research through post-mortem. Comes back to committed code.

The platform team — parallel agents, hands-free improvement:

> /swarm "run /rpi on each of these 3 epics"
> /evolve --max-cycles=5

Swarms full pipelines in parallel. Evolve measures goals and fixes gaps in a loop.

Not sure which skill to run? See the Skill Router.

Skills

Every skill works alone. Compose them however you want.

Judgment — the foundation everything validates against:

Skill	What it does
`/council`	Independent judges (Claude + Codex) debate, surface disagreement, converge. `--preset=security-audit`, `--perspectives`, `--debate` for adversarial review
`/vibe`	Code quality review — complexity analysis + council
`/pre-mortem`	Validate plans before implementation — council simulates failures
`/post-mortem`	Wrap up completed work — council validates + retro extracts learnings

Execution — research, plan, build, ship:

Skill	What it does
`/research`	Deep codebase exploration — produces structured findings
`/plan`	Decompose a goal into trackable issues with dependency waves
`/implement`	Full lifecycle for one task — research, plan, build, validate, learn
`/crank`	Parallel agents in dependency-ordered waves, fresh context per worker
`/swarm`	Parallelize any skill — run research, brainstorms, implementations in parallel
`/rpi`	Full pipeline: discovery (research + plan + pre-mortem) → implementation (crank) → validation (vibe + post-mortem)
`/evolve`	Measure fitness goals, fix the worst gap, roll back regressions, loop

Knowledge — the flywheel that makes sessions compound:

Skill	What it does
`/knowledge`	Query learnings, patterns, and decisions across `.agents/`
`/learn`	Manually capture a decision, pattern, or lesson
`/retro`	Extract learnings from completed work
`/flywheel`	Monitor knowledge health — velocity, staleness, pool depths

Supporting skills:


Onboarding	`/quickstart`, `/using-agentops`
Session	`/handoff`, `/recover`, `/status`
Traceability	`/trace`, `/provenance`
Product	`/product`, `/goals`, `/release`, `/readme`, `/doc`
Utility	`/brainstorm`, `/bug-hunt`, `/complexity`

Full reference: docs/SKILLS.md

Cross-runtime orchestration — mix Claude, Codex, OpenCode

AgentOps orchestrates across runtimes. Claude can lead a team of Codex workers. Codex judges can review Claude's output.

Spawning Backend	How it works	Best for
Native teams	`TeamCreate` + `SendMessage` — built into Claude Code	Tight coordination, debate
Background tasks	`Task(run_in_background=true)` — last-resort fallback	When no team APIs available
Codex sub-agents	`/codex-team` — Claude orchestrates Codex workers	Cross-vendor validation
tmux + Agent Mail	`/swarm --mode=distributed` — full process isolation	Long-running work, crash recovery

Distributed mode workers survive disconnects — each runs in its own tmux session with crash recovery. tmux attach to debug live.

Deep Dive

How the knowledge system and pipeline phases work under the hood.

The Knowledge Ledger

.agents/ is an append-only ledger with cache-like semantics. Nothing gets overwritten — every learning, council verdict, pattern, and decision is a new dated file. Freshness decay prunes what's stale. The cycle:

Session N ends
    → ao forge: mine transcript for learnings, decisions, patterns
    → ao maturity --expire: mark stale artifacts (freshness decay)
    → ao maturity --evict: archive what's decayed past threshold

Session N+1 starts
    → ao inject --apply-decay: score all artifacts by recency,
      inject top-N within token budget
    → Agent starts with institutional knowledge, not a blank slate

Write once, score by freshness, inject the best, prune the rest. If retrieval_rate × usage_rate stays above decay and scale friction, knowledge compounds. If not, growth stalls unless fresh input or stronger controls are added. The formal model is cache eviction with a decay function and limits-to-growth controls.

  /rpi "goal"
    │
    ├── /research → /plan → /pre-mortem → /crank → /vibe
    │
    ▼
  /post-mortem
    ├── validates what shipped
    ├── extracts learnings → .agents/
    └── suggests next /rpi command ────┐
                                       │
   /rpi "next goal" ◄──────────────────┘

The post-mortem analyzes each learning, asks "what process would this improve?", and writes improvement proposals. It hands you a ready-to-copy /rpi command. Paste it, walk away.

Learnings pass quality gates (specificity, actionability, novelty) and land in tiered pools. Freshness decay ensures recent insights outweigh stale patterns.

Phase details — what each step does

/research — Explores your codebase. Produces a research artifact with findings and recommendations.
/plan — Decomposes the goal into issues with dependency waves. Derives scope boundaries and conformance checks. Creates a beads epic (git-native issue tracking).
/pre-mortem — Judges simulate failures before you write code, including a spec-completeness judge. FAIL? Re-plan with feedback (max 3 retries).
/crank — Spawns parallel agents in dependency-ordered waves. Each worker gets fresh context. Lead validates and commits. Runs until every issue is closed. --test-first for spec-first TDD.
/vibe — Judges validate the code. FAIL? Re-crank with failure context and re-vibe (max 3).
/post-mortem — Council validates the implementation. Retro extracts learnings. Suggests the next /rpi command.

/rpi "goal" runs all six end to end. Use --interactive for human gates at research and plan.

Phased RPI — fresh context per phase for larger goals

ao rpi phased "goal" runs each phase in its own session — no context bleed between phases. Use /rpi when context fits in one session. Use ao rpi phased when you need phase-level resume control. For autonomous control-plane operation, use the canonical path ao rpi loop --supervisor. See The ao CLI for examples.

Goal-driven mode — /evolve with GOALS.yaml

Bootstrap with /goals generate — it scans your repo (PRODUCT.md, README, skills, tests) and proposes mechanically verifiable goals. Or write them by hand:

# GOALS.yaml
version: 1
goals:
  - id: test-pass-rate
    description: "All tests pass"
    check: "make test"
    weight: 10

Then /evolve measures them, picks the worst gap, runs /rpi to fix it, re-measures ALL goals (regressed commits auto-revert), and loops. It commits locally — you control when to push. Kill switch: echo "stop" > ~/.config/evolve/KILL

Maintain over time: /goals shows pass/fail status, /goals prune finds stale or broken checks.

References — science, systems theory, prior art

Built on Darr 1995 (decay rates), Sweller 1988 (cognitive load), Liu et al. 2023 (lost-in-the-middle), MemRL 2025 (RL for memory).

AgentOps concentrates on the high-leverage end of Meadows' hierarchy: information flows (#6), rules (#5), self-organization (#4), goals (#3). The bet: changing the loop beats tuning the output.

Deep dive: docs/how-it-works.md — Brownian Ratchet, Ralph Wiggum Pattern, agent backends, hooks, context windowing.

The `ao` CLI

Skills work standalone — no CLI required. The ao CLI adds two things: (1) the knowledge flywheel that makes sessions compound (extract, inject, decay, maturity), and (2) terminal-based RPI that runs without an active chat session. Each phase gets its own fresh context window, so large goals don't hit context limits.

ao rpi loop --supervisor --max-cycles 1        # Canonical autonomous cycle (policy-gated landing)
ao rpi loop --supervisor "fix auth bug"        # Single explicit-goal supervised cycle
ao rpi phased --from=implementation "ag-058"   # Resume a specific phased run at build phase
ao rpi status --watch                          # Monitor active/terminal runs

Walk away, come back to committed code + extracted learnings.

Supervisor determinism contract: task failures mark queue entries failed, infrastructure failures leave queue entries retryable, and ao rpi cancel ignores stale supervisor lease metadata. For recovery/hygiene, pair ao rpi cancel with ao rpi cleanup --all --prune-worktrees --prune-branches.

ao search "query"      # Search knowledge across files and chat history
ao demo                # Interactive demo

Full reference: CLI Commands

Architecture

Five pillars, one recursive shape. The same pattern — lead decomposes work, workers execute atomically, validation gates lock progress, next wave begins — repeats at every scale:

/implement ── one worker, one issue, one verify cycle
    └── /crank ── waves of /implement (FIRE loop)
        └── /rpi ── research → plan → crank → validate → learn
            └── /evolve ── fitness-gated /rpi cycles

Each level treats the one below as a black box: spec in, validated result out. Workers get fresh context per wave (Ralph Wiggum Pattern), never commit (lead-only), and communicate through the filesystem — not accumulated chat context. Parallel execution works because each unit of work is atomic: no shared mutable state with concurrent workers.

Validation is mechanical, not advisory. Multi-model councils judge before and after implementation. Hooks enforce gates — push blocked until /vibe passes, /crank blocked until /pre-mortem passes. The knowledge flywheel extracts learnings, scores them, and re-injects them at session start so each cycle compounds.

Full treatment: docs/ARCHITECTURE.md — all five pillars, operational invariants, component overview.

How AgentOps Fits With Other Tools

These are fellow experiments in making coding agents work. Use pieces from any of them.

Alternative	What it does well	Where AgentOps focuses differently
GSD	Clean subagent spawning, fights context rot	Cross-session memory (GSD keeps context fresh within a session; AgentOps carries knowledge between sessions)
Compound Engineer	Knowledge compounding, structured loop	Multi-model councils and validation gates — independent judges debating before and after code ships

Detailed comparisons →

FAQ

docs/FAQ.md — comparisons, limitations, subagent nesting, PRODUCT.md, uninstall.

Built on — Ralph Wiggum, Multiclaude, beads, CASS, MemRL

Ralph Wiggum (fresh context per agent) · Multiclaude (validation gates) · beads (git-native issues) · CASS (session search) · MemRL (cross-session memory)

Contributing

Issue tracking — Beads / bd

Git-native issues in .beads/. bd onboard (setup) · bd ready (find work) · bd show <id> · bd close <id> · bd sync. More: AGENTS.md

See CONTRIBUTING.md. If AgentOps helped you ship something, post in Discussions.

License

Apache-2.0 · Docs · How It Works · FAQ · Glossary · Architecture · Configuration · CLI Reference · Changelog

Name		Name	Last commit message	Last commit date
Latest commit History 951 Commits
.agents/tracks		.agents/tracks
.beads		.beads
.claude-plugin		.claude-plugin
.claude		.claude
.codex		.codex
.githooks		.githooks
.github		.github
.opencode		.opencode
agents		agents
bin		bin
cli		cli
docs		docs
homebrew-tap		homebrew-tap
hooks		hooks
lib		lib
schemas		schemas
scripts		scripts
skills		skills
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitleaks.toml		.gitleaks.toml
.goreleaser.yml		.goreleaser.yml
.markdownlint.json		.markdownlint.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GOALS.yaml		GOALS.yaml
LICENSE		LICENSE
Makefile		Makefile
PRODUCT.md		PRODUCT.md
README.md		README.md
SECURITY.md		SECURITY.md
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works

Install

See It Work

Skills

Deep Dive

The Knowledge Ledger

The `ao` CLI

Architecture

How AgentOps Fits With Other Tools

FAQ

Contributing

License

About

Uh oh!

Releases 44

Uh oh!

Contributors 5

Uh oh!

Languages

License

boshu2/agentops

Folders and files

Latest commit

History

Repository files navigation

AgentOps

Coding agents forget everything between sessions. This fixes that.

How It Works

Install

See It Work

Skills

Deep Dive

The Knowledge Ledger

The ao CLI

Architecture

How AgentOps Fits With Other Tools

FAQ

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 44

Uh oh!

Contributors 5

Uh oh!

Languages

The `ao` CLI