A self-evolving skill that turns user corrections, preferences, task reflections, repeated mistakes, and workflow lessons into durable agent operating knowledge.
中文 | English
Agent Evolution is a single-skill package for agents that need to improve over time without depending on a pile of manual notes or fragile prompt tweaks.
It gives your agent a practical evolution loop:
Capture signal -> Triage -> Risk-grade -> Store -> Auto-promote safe learnings -> Review risky changes -> Prune stale rules
It supports Codex, Claude Code, OpenClaw, and generic agent environments that can load local SKILL.md-style instructions.
Most agents do not truly improve from experience:
- User preferences are mentioned once, then forgotten.
- Corrections fix the current answer but do not affect future behavior.
- Repeated mistakes keep happening because no rule is promoted.
- Task summaries describe what happened but do not extract reusable lessons.
- Trigger phrases keep growing, but stale or noisy triggers are never cleaned up.
- Memory files become noisy because everything is stored at the same level.
Agent Evolution gives the agent a structured way to decide:
- What should be remembered immediately.
- What should be only a candidate.
- What is safe to auto-promote.
- What must require human confirmation.
- What should be archived or pruned.
Agent Evolution is inspired by Andrej Karpathy's Software 3.0 / LLM OS framing and the broader context engineering discussion: LLM behavior is increasingly shaped by natural-language instructions, context, tools, memory, examples, feedback, evaluation, and pruning, not only by traditional code.
In that framing, an agent's real "program" is not a single prompt. It is the context system around the model:
instructions + memory + tools + examples + feedback + evals + pruning
Agent Evolution turns that idea into a small, operational skill:
- Context is treated as an editable runtime, not a pile of notes.
- User feedback becomes structured operating knowledge.
- Low-risk learnings can be promoted automatically.
- Repeated failures become eval-backed rule candidates.
- Trigger phrases evolve through a lifecycle instead of growing forever.
- Human confirmation stays in the loop for high-impact changes.
This project is not affiliated with or endorsed by Andrej Karpathy, Anthropic, LangChain, or Shopify. It borrows the engineering lens: in the Software 3.0 era, improving an agent means engineering its context, memory, tools, feedback loops, validation surfaces, and pruning mechanisms.
References:
- Andrej Karpathy, Software Is Changing (Again), YC AI Startup School.
- Andrej Karpathy, Software 2.0.
- Tobi Lutke, context engineering over prompt engineering.
- Anthropic, Effective context engineering for AI agents.
- LangChain, Context Engineering for Agents.
- LangChain, How agents can use filesystems for context engineering.
| Capability | What It Handles | Output |
|---|---|---|
| Direct memory | Explicit user preferences such as "remember this" or "always do X" | Stable user memory |
| Task reflection | Completed work, summaries, repeated workflows | Reusable lessons |
| Error learning | User corrections and repeated mistakes | Candidate or promoted rule |
| Tool gotchas | Paths, command failures, environment issues | Tool/workflow memory |
| Risk grading | Low / medium / high risk classification | Auto-promote or review |
| Rule promotion | Stable lessons become durable behavior | Memory or instruction update |
| Trigger governance | Missed triggers and false triggers | Trigger lifecycle management |
| Pruning | Stale, duplicate, or conflicting rules | Keep, merge, demote, archive, remove |
| Project Ledger | High-volume projects, long sessions, multi-task windows | Show projects, outputs, and decisions before memory triage |
Many memory systems mostly catch explicit phrases such as "remember this", "always do X", or "do not do Y again". That is safe, but it misses the bigger question: what work actually happened, what was produced, what decisions were made, and which workflow lessons are reusable.
Project Ledger is Agent Evolution's project outcome layer inside scan reports. Before memory triage, it answers:
- What projects or tasks happened in the scan window?
- What was each project's goal?
- What files, repositories, skills, workflows, reports, media artifacts, or verification results were produced?
- What decisions or constraints did the user confirm?
- Which lessons need review, and which details are task-local?
- Why was an item promoted, written as a candidate, archived, or rejected?
session activity
-> Project Ledger
-> reusable learning hints
-> candidate / durable memory / archive
This keeps Agent Evolution from becoming only a keyword scanner. It first makes real project outcomes visible, then decides what deserves memory.
flowchart TD
A["User signal<br/>preference, correction, reflection, repeated failure"] --> B["Triage<br/>classify signal type and risk"]
B --> C{"Explicit low-risk preference?"}
C -->|Yes| D["Direct memory<br/>store without slow validation"]
C -->|No| E{"Repeated or high impact?"}
E -->|Yes| F["Candidate review<br/>write to evolution-candidates.md"]
E -->|No| G["Lightweight learning<br/>keep as task reflection or low-risk memory"]
F --> H["Validate<br/>manual review or eval loop"]
H --> I{"Stable and useful?"}
I -->|Yes| J["Promote<br/>write durable operating rule"]
I -->|No| K["Archive or discard<br/>avoid noisy memory growth"]
D --> L["Apply in future work"]
G --> L
J --> L
L --> M["Prune<br/>merge, demote, archive, or remove stale rules"]
M --> B
Agent Evolution has three startup levels:
metadata-trigger
The host loads the skill when SKILL.md matches the user request.
opportunistic-self-start
Once loaded, the skill runs lightweight checks after significant work, corrections, tool failures, or repeated issues.
scheduled-reflection-adapter
Optional background scan through Codex automation, cron, heartbeat, hooks, or another host scheduler.
When installed in Codex, Agent Evolution can create a 6-hour graded scan automation:
Every 6 hours
-> scan recent bounded session logs
-> extract useful learnings
-> auto-promote low-risk learnings
-> write medium/high-risk items to review candidates
-> never auto-edit global rules, skills, external systems, or secrets
Agent Evolution does not treat all learnings equally.
Low-risk learnings can be automatically written to evolution.md.
Examples:
- Explicit user preferences.
- User-corrected low-risk behavior.
- Stable local path or tool gotchas.
- Repeated small workflow mistakes with clear fixes.
Example:
Rule:
- When installing user-managed skills, default to `~/.agents/skills` unless the user explicitly names another directory.Medium-risk learnings go to evolution-candidates.md.
Examples:
- Changes to default workflow.
- Skill routing or trigger phrasing changes.
- Behavior that affects several task types.
- Inferred patterns without explicit user confirmation.
High-risk learnings are never auto-promoted.
Examples:
- File deletion or overwrite behavior.
- GitHub push, publish, sync, or repository changes.
- Feishu/Lark, email, posting, or other external systems.
- Credentials, tokens, cookies, secrets.
- Automation behavior.
- Global instruction files such as
AGENTS.md. - Skill file edits.
- Broad cross-agent behavior changes.
Agent Evolution uses three memory files:
evolution.md
Low-risk auto-promoted learnings.
evolution-candidates.md
Medium/high-risk learnings waiting for review.
evolution-promotions.md
Audit log for automatic promotions.
This creates a feedback loop without letting the agent rewrite high-impact rules without review.
Agent Evolution is a single-skill repository with a lightweight installer.
The installer uses minimal dependencies:
bashmkdircptarcurlonly for one-line remote install
No database. No Docker. No browser automation. No npm install. No external API. No GitHub token.
curl -fsSL https://raw.githubusercontent.com/chemny/agent-evolution/main/install.sh | bashThe installer:
- Installs the skill to:
~/.agents/skills/agent-evolution
- Creates memory templates:
evolution.md
evolution-candidates.md
evolution-promotions.md
- Detects supported host environments:
- Codex
- Claude Code
- OpenClaw
- Generic CLI
- For Codex, creates a 6-hour graded scan automation:
~/.codex/automations/agent-evolution-graded-scan/automation.toml
- For Claude Code, OpenClaw, and generic environments, installs adapter prompts and memory templates.
| Platform | Core Skill | Memory Templates | 6-Hour Background Scan | Low-Risk Auto-Promotion |
|---|---|---|---|---|
| Codex | yes | yes | yes, via Codex automation | yes |
| OpenClaw | yes | yes | if host scheduler is available | yes, when scheduled |
| Claude Code | yes | yes | if hooks or cron are available | yes, when scheduled |
| Generic CLI | yes | yes | only with AGENT_EVOLUTION_SCAN_COMMAND |
command-dependent |
Background self-running is a host capability.
The skill provides adapters and templates, but each platform must have a way to run scheduled jobs.
After installation, run:
~/.agents/skills/agent-evolution/scripts/verify-install.shThen start a fresh agent session and test:
Use agent-evolution: remember that my writing style is direct and example-driven. Do not write files; just explain how you would handle this memory.
Expected behavior:
Path: direct memory
Validation: not required
Destination: host agent user memory
Remember: my writing style is direct, practical, and avoids marketing language.
Expected handling:
Type: preference
Risk: low
Action: store as user memory
You made the same directory-sync mistake again. Do not use the old sync logic anymore.
Expected handling:
Type: correction
Risk: low or medium depending on scope
Action: auto-promote if local and explicit; otherwise write candidate
Summarize what should be learned from this task.
Expected output:
## Evolution Reference
- Reusable learning:
- User preference:
- Tool or environment gotcha:
- Next time avoid:
- Suggested rule update:From now on, automatically delete old duplicate skills.
Expected handling:
Risk: high
Action: write candidate only
Reason: deletion behavior requires confirmation
agent-evolution/
├── SKILL.md
├── install.sh
├── README.md
├── README.zh.md
├── LICENSE
├── templates/
│ ├── evolution.md
│ ├── evolution-candidates.md
│ ├── evolution-promotions.md
│ ├── codex-automation.toml
│ └── generic-scan-prompt.md
├── adapters/
│ ├── codex.md
│ ├── claude-code.md
│ └── openclaw.md
├── scripts/
│ ├── detect-platform.sh
│ ├── install-codex.sh
│ ├── install-claude-code.sh
│ ├── install-openclaw.sh
│ ├── install-generic-cron.sh
│ ├── verify-install.sh
│ ├── log-event.mjs
│ ├── promote-rule.mjs
│ └── prune-rules.mjs
├── references/
│ ├── direct-memory.md
│ ├── eval-loop.md
│ ├── memory-layers.md
│ ├── promotion.md
│ ├── pruning.md
│ ├── reflection.md
│ ├── safety.md
│ ├── self-start.md
│ ├── storage-routing.md
│ ├── triage.md
│ ├── trigger-evolution.md
│ └── trigger-registry.md
└── evals/
└── evals.json
Agent Evolution will not automatically:
- Store secrets, tokens, cookies, passwords, or private keys.
- Delete files.
- Overwrite existing files.
- Push, publish, sync, or change GitHub repositories.
- Operate external systems such as Feishu/Lark, email, social platforms, or production services.
- Change global instruction files.
- Edit skill files.
- Change automation behavior.
- Promote broad behavior changes without review.
High-impact changes are written to candidates and require confirmation.
Re-run the installer:
curl -fsSL https://raw.githubusercontent.com/chemny/agent-evolution/main/install.sh | bashThen start a fresh agent session if your host scans skills only at startup.
See LICENSE.