Agent Evolution

A self-evolving skill that turns user corrections, preferences, task reflections, repeated mistakes, and workflow lessons into durable agent operating knowledge.

中文 | English

Agent Evolution is a single-skill package for agents that need to improve over time without depending on a pile of manual notes or fragile prompt tweaks.

It gives your agent a practical evolution loop:

Capture signal -> Triage -> Risk-grade -> Store -> Auto-promote safe learnings -> Review risky changes -> Prune stale rules

It supports Codex, Claude Code, OpenClaw, and generic agent environments that can load local SKILL.md-style instructions.

What Problem It Solves

Most agents do not truly improve from experience:

User preferences are mentioned once, then forgotten.
Corrections fix the current answer but do not affect future behavior.
Repeated mistakes keep happening because no rule is promoted.
Task summaries describe what happened but do not extract reusable lessons.
Trigger phrases keep growing, but stale or noisy triggers are never cleaned up.
Memory files become noisy because everything is stored at the same level.

Agent Evolution gives the agent a structured way to decide:

What should be remembered immediately.
What should be only a candidate.
What is safe to auto-promote.
What must require human confirmation.
What should be archived or pruned.

Design Philosophy

Agent Evolution is inspired by Andrej Karpathy's Software 3.0 / LLM OS framing and the broader context engineering discussion: LLM behavior is increasingly shaped by natural-language instructions, context, tools, memory, examples, feedback, evaluation, and pruning, not only by traditional code.

In that framing, an agent's real "program" is not a single prompt. It is the context system around the model:

instructions + memory + tools + examples + feedback + evals + pruning

Agent Evolution turns that idea into a small, operational skill:

Context is treated as an editable runtime, not a pile of notes.
User feedback becomes structured operating knowledge.
Low-risk learnings can be promoted automatically.
Repeated failures become eval-backed rule candidates.
Trigger phrases evolve through a lifecycle instead of growing forever.
Human confirmation stays in the loop for high-impact changes.

This project is not affiliated with or endorsed by Andrej Karpathy, Anthropic, LangChain, or Shopify. It borrows the engineering lens: in the Software 3.0 era, improving an agent means engineering its context, memory, tools, feedback loops, validation surfaces, and pruning mechanisms.

References:

Andrej Karpathy, Software Is Changing (Again), YC AI Startup School.
Andrej Karpathy, Software 2.0.
Tobi Lutke, context engineering over prompt engineering.
Anthropic, Effective context engineering for AI agents.
LangChain, Context Engineering for Agents.
LangChain, How agents can use filesystems for context engineering.

What It Can Do

Capability	What It Handles	Output
Direct memory	Explicit user preferences such as "remember this" or "always do X"	Stable user memory
Task reflection	Completed work, summaries, repeated workflows	Reusable lessons
Error learning	User corrections and repeated mistakes	Candidate or promoted rule
Tool gotchas	Paths, command failures, environment issues	Tool/workflow memory
Risk grading	Low / medium / high risk classification	Auto-promote or review
Rule promotion	Stable lessons become durable behavior	Memory or instruction update
Trigger governance	Missed triggers and false triggers	Trigger lifecycle management
Pruning	Stale, duplicate, or conflicting rules	Keep, merge, demote, archive, remove
Project Ledger	High-volume projects, long sessions, multi-task windows	Show projects, outputs, and decisions before memory triage

New In This Version: Project Ledger

Many memory systems mostly catch explicit phrases such as "remember this", "always do X", or "do not do Y again". That is safe, but it misses the bigger question: what work actually happened, what was produced, what decisions were made, and which workflow lessons are reusable.

Project Ledger is Agent Evolution's project outcome layer inside scan reports. Before memory triage, it answers:

What projects or tasks happened in the scan window?
What was each project's goal?
What files, repositories, skills, workflows, reports, media artifacts, or verification results were produced?
What decisions or constraints did the user confirm?
Which lessons need review, and which details are task-local?
Why was an item promoted, written as a candidate, archived, or rejected?

session activity
  -> Project Ledger
  -> reusable learning hints
  -> candidate / durable memory / archive

This keeps Agent Evolution from becoming only a keyword scanner. It first makes real project outcomes visible, then decides what deserves memory.

Core Workflow

flowchart TD
  A["User signal<br/>preference, correction, reflection, repeated failure"] --> B["Triage<br/>classify signal type and risk"]
  B --> C{"Explicit low-risk preference?"}
  C -->|Yes| D["Direct memory<br/>store without slow validation"]
  C -->|No| E{"Repeated or high impact?"}
  E -->|Yes| F["Candidate review<br/>write to evolution-candidates.md"]
  E -->|No| G["Lightweight learning<br/>keep as task reflection or low-risk memory"]
  F --> H["Validate<br/>manual review or eval loop"]
  H --> I{"Stable and useful?"}
  I -->|Yes| J["Promote<br/>write durable operating rule"]
  I -->|No| K["Archive or discard<br/>avoid noisy memory growth"]
  D --> L["Apply in future work"]
  G --> L
  J --> L
  L --> M["Prune<br/>merge, demote, archive, or remove stale rules"]
  M --> B

Self-Running Mechanism

Agent Evolution has three startup levels:

metadata-trigger
  The host loads the skill when SKILL.md matches the user request.

opportunistic-self-start
  Once loaded, the skill runs lightweight checks after significant work, corrections, tool failures, or repeated issues.

scheduled-reflection-adapter
  Optional background scan through Codex automation, cron, heartbeat, hooks, or another host scheduler.

When installed in Codex, Agent Evolution can create a 6-hour graded scan automation:

Every 6 hours
-> scan recent bounded session logs
-> extract useful learnings
-> auto-promote low-risk learnings
-> write medium/high-risk items to review candidates
-> never auto-edit global rules, skills, external systems, or secrets

Risk-Graded Auto-Promotion

Agent Evolution does not treat all learnings equally.

Low Risk: Auto-Promote

Low-risk learnings can be automatically written to evolution.md.

Examples:

Explicit user preferences.
User-corrected low-risk behavior.
Stable local path or tool gotchas.
Repeated small workflow mistakes with clear fixes.

Example:

Rule:
- When installing user-managed skills, default to `~/.agents/skills` unless the user explicitly names another directory.

Medium Risk: Candidate Review

Medium-risk learnings go to evolution-candidates.md.

Examples:

Changes to default workflow.
Skill routing or trigger phrasing changes.
Behavior that affects several task types.
Inferred patterns without explicit user confirmation.

High Risk: Requires Confirmation

High-risk learnings are never auto-promoted.

Examples:

File deletion or overwrite behavior.
GitHub push, publish, sync, or repository changes.
Feishu/Lark, email, posting, or other external systems.
Credentials, tokens, cookies, secrets.
Automation behavior.
Global instruction files such as AGENTS.md.
Skill file edits.
Broad cross-agent behavior changes.

Installed Memory Files

Agent Evolution uses three memory files:

evolution.md
  Low-risk auto-promoted learnings.

evolution-candidates.md
  Medium/high-risk learnings waiting for review.

evolution-promotions.md
  Audit log for automatic promotions.

This creates a feedback loop without letting the agent rewrite high-impact rules without review.

Install

Agent Evolution is a single-skill repository with a lightweight installer.

The installer uses minimal dependencies:

bash
mkdir
cp
tar
curl only for one-line remote install

No database. No Docker. No browser automation. No npm install. No external API. No GitHub token.

One-Line Install

curl -fsSL https://raw.githubusercontent.com/chemny/agent-evolution/main/install.sh | bash

What The Installer Does

The installer:

Installs the skill to:

~/.agents/skills/agent-evolution

Creates memory templates:

evolution.md
evolution-candidates.md
evolution-promotions.md

Detects supported host environments:

Codex
Claude Code
OpenClaw
Generic CLI

For Codex, creates a 6-hour graded scan automation:

~/.codex/automations/agent-evolution-graded-scan/automation.toml

For Claude Code, OpenClaw, and generic environments, installs adapter prompts and memory templates.

Platform Support

Platform	Core Skill	Memory Templates	6-Hour Background Scan	Low-Risk Auto-Promotion
Codex	yes	yes	yes, via Codex automation	yes
OpenClaw	yes	yes	if host scheduler is available	yes, when scheduled
Claude Code	yes	yes	if hooks or cron are available	yes, when scheduled
Generic CLI	yes	yes	only with `AGENT_EVOLUTION_SCAN_COMMAND`	command-dependent

Background self-running is a host capability.

The skill provides adapters and templates, but each platform must have a way to run scheduled jobs.

Verify Install

After installation, run:

~/.agents/skills/agent-evolution/scripts/verify-install.sh

Then start a fresh agent session and test:

Use agent-evolution: remember that my writing style is direct and example-driven. Do not write files; just explain how you would handle this memory.

Expected behavior:

Path: direct memory
Validation: not required
Destination: host agent user memory

Usage Examples

Remember A Preference

Remember: my writing style is direct, practical, and avoids marketing language.

Expected handling:

Type: preference
Risk: low
Action: store as user memory

Learn From A Correction

You made the same directory-sync mistake again. Do not use the old sync logic anymore.

Expected handling:

Type: correction
Risk: low or medium depending on scope
Action: auto-promote if local and explicit; otherwise write candidate

Reflect After A Task

Summarize what should be learned from this task.

Expected output:

## Evolution Reference

- Reusable learning:
- User preference:
- Tool or environment gotcha:
- Next time avoid:
- Suggested rule update:

Handle A High-Risk Rule

From now on, automatically delete old duplicate skills.

Expected handling:

Risk: high
Action: write candidate only
Reason: deletion behavior requires confirmation

File Structure

agent-evolution/
├── SKILL.md
├── install.sh
├── README.md
├── README.zh.md
├── LICENSE
├── templates/
│   ├── evolution.md
│   ├── evolution-candidates.md
│   ├── evolution-promotions.md
│   ├── codex-automation.toml
│   └── generic-scan-prompt.md
├── adapters/
│   ├── codex.md
│   ├── claude-code.md
│   └── openclaw.md
├── scripts/
│   ├── detect-platform.sh
│   ├── install-codex.sh
│   ├── install-claude-code.sh
│   ├── install-openclaw.sh
│   ├── install-generic-cron.sh
│   ├── verify-install.sh
│   ├── log-event.mjs
│   ├── promote-rule.mjs
│   └── prune-rules.mjs
├── references/
│   ├── direct-memory.md
│   ├── eval-loop.md
│   ├── memory-layers.md
│   ├── promotion.md
│   ├── pruning.md
│   ├── reflection.md
│   ├── safety.md
│   ├── self-start.md
│   ├── storage-routing.md
│   ├── triage.md
│   ├── trigger-evolution.md
│   └── trigger-registry.md
└── evals/
    └── evals.json

Safety Boundaries

Agent Evolution will not automatically:

Store secrets, tokens, cookies, passwords, or private keys.
Delete files.
Overwrite existing files.
Push, publish, sync, or change GitHub repositories.
Operate external systems such as Feishu/Lark, email, social platforms, or production services.
Change global instruction files.
Edit skill files.
Change automation behavior.
Promote broad behavior changes without review.

High-impact changes are written to candidates and require confirmation.

Update

Re-run the installer:

curl -fsSL https://raw.githubusercontent.com/chemny/agent-evolution/main/install.sh | bash

Then start a fresh agent session if your host scans skills only at startup.

License

See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Evolution

What Problem It Solves

Design Philosophy

What It Can Do

New In This Version: Project Ledger

Core Workflow

Self-Running Mechanism

Risk-Graded Auto-Promotion

Low Risk: Auto-Promote

Medium Risk: Candidate Review

High Risk: Requires Confirmation

Installed Memory Files

Install

One-Line Install

What The Installer Does

Platform Support

Verify Install

Usage Examples

Remember A Preference

Learn From A Correction

Reflect After A Task

Handle A High-Risk Rule

File Structure

Safety Boundaries

Update

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
adapters		adapters
evals		evals
references		references
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
SKILL.md		SKILL.md
install.sh		install.sh
test-prompts.json		test-prompts.json

Folders and files

Latest commit

History

Repository files navigation

Agent Evolution

What Problem It Solves

Design Philosophy

What It Can Do

New In This Version: Project Ledger

Core Workflow

Self-Running Mechanism

Risk-Graded Auto-Promotion

Low Risk: Auto-Promote

Medium Risk: Candidate Review

High Risk: Requires Confirmation

Installed Memory Files

Install

One-Line Install

What The Installer Does

Platform Support

Verify Install

Usage Examples

Remember A Preference

Learn From A Correction

Reflect After A Task

Handle A High-Risk Rule

File Structure

Safety Boundaries

Update

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages