Skip to content

chemny/agent-evolution

Repository files navigation

Agent Evolution

A self-evolving skill that turns user corrections, preferences, task reflections, repeated mistakes, and workflow lessons into durable agent operating knowledge.

中文 | English

Agent Evolution is a single-skill package for agents that need to improve over time without depending on a pile of manual notes or fragile prompt tweaks.

It gives your agent a practical evolution loop:

Capture signal -> Triage -> Risk-grade -> Store -> Auto-promote safe learnings -> Review risky changes -> Prune stale rules

It supports Codex, Claude Code, OpenClaw, and generic agent environments that can load local SKILL.md-style instructions.


What Problem It Solves

Most agents do not truly improve from experience:

  • User preferences are mentioned once, then forgotten.
  • Corrections fix the current answer but do not affect future behavior.
  • Repeated mistakes keep happening because no rule is promoted.
  • Task summaries describe what happened but do not extract reusable lessons.
  • Trigger phrases keep growing, but stale or noisy triggers are never cleaned up.
  • Memory files become noisy because everything is stored at the same level.

Agent Evolution gives the agent a structured way to decide:

  • What should be remembered immediately.
  • What should be only a candidate.
  • What is safe to auto-promote.
  • What must require human confirmation.
  • What should be archived or pruned.

Design Philosophy

Agent Evolution is inspired by Andrej Karpathy's Software 3.0 / LLM OS framing and the broader context engineering discussion: LLM behavior is increasingly shaped by natural-language instructions, context, tools, memory, examples, feedback, evaluation, and pruning, not only by traditional code.

In that framing, an agent's real "program" is not a single prompt. It is the context system around the model:

instructions + memory + tools + examples + feedback + evals + pruning

Agent Evolution turns that idea into a small, operational skill:

  • Context is treated as an editable runtime, not a pile of notes.
  • User feedback becomes structured operating knowledge.
  • Low-risk learnings can be promoted automatically.
  • Repeated failures become eval-backed rule candidates.
  • Trigger phrases evolve through a lifecycle instead of growing forever.
  • Human confirmation stays in the loop for high-impact changes.

This project is not affiliated with or endorsed by Andrej Karpathy, Anthropic, LangChain, or Shopify. It borrows the engineering lens: in the Software 3.0 era, improving an agent means engineering its context, memory, tools, feedback loops, validation surfaces, and pruning mechanisms.

References:


What It Can Do

Capability What It Handles Output
Direct memory Explicit user preferences such as "remember this" or "always do X" Stable user memory
Task reflection Completed work, summaries, repeated workflows Reusable lessons
Error learning User corrections and repeated mistakes Candidate or promoted rule
Tool gotchas Paths, command failures, environment issues Tool/workflow memory
Risk grading Low / medium / high risk classification Auto-promote or review
Rule promotion Stable lessons become durable behavior Memory or instruction update
Trigger governance Missed triggers and false triggers Trigger lifecycle management
Pruning Stale, duplicate, or conflicting rules Keep, merge, demote, archive, remove
Project Ledger High-volume projects, long sessions, multi-task windows Show projects, outputs, and decisions before memory triage

New In This Version: Project Ledger

Many memory systems mostly catch explicit phrases such as "remember this", "always do X", or "do not do Y again". That is safe, but it misses the bigger question: what work actually happened, what was produced, what decisions were made, and which workflow lessons are reusable.

Project Ledger is Agent Evolution's project outcome layer inside scan reports. Before memory triage, it answers:

  • What projects or tasks happened in the scan window?
  • What was each project's goal?
  • What files, repositories, skills, workflows, reports, media artifacts, or verification results were produced?
  • What decisions or constraints did the user confirm?
  • Which lessons need review, and which details are task-local?
  • Why was an item promoted, written as a candidate, archived, or rejected?
session activity
  -> Project Ledger
  -> reusable learning hints
  -> candidate / durable memory / archive

This keeps Agent Evolution from becoming only a keyword scanner. It first makes real project outcomes visible, then decides what deserves memory.


Core Workflow

flowchart TD
  A["User signal<br/>preference, correction, reflection, repeated failure"] --> B["Triage<br/>classify signal type and risk"]
  B --> C{"Explicit low-risk preference?"}
  C -->|Yes| D["Direct memory<br/>store without slow validation"]
  C -->|No| E{"Repeated or high impact?"}
  E -->|Yes| F["Candidate review<br/>write to evolution-candidates.md"]
  E -->|No| G["Lightweight learning<br/>keep as task reflection or low-risk memory"]
  F --> H["Validate<br/>manual review or eval loop"]
  H --> I{"Stable and useful?"}
  I -->|Yes| J["Promote<br/>write durable operating rule"]
  I -->|No| K["Archive or discard<br/>avoid noisy memory growth"]
  D --> L["Apply in future work"]
  G --> L
  J --> L
  L --> M["Prune<br/>merge, demote, archive, or remove stale rules"]
  M --> B
Loading

Self-Running Mechanism

Agent Evolution has three startup levels:

metadata-trigger
  The host loads the skill when SKILL.md matches the user request.

opportunistic-self-start
  Once loaded, the skill runs lightweight checks after significant work, corrections, tool failures, or repeated issues.

scheduled-reflection-adapter
  Optional background scan through Codex automation, cron, heartbeat, hooks, or another host scheduler.

When installed in Codex, Agent Evolution can create a 6-hour graded scan automation:

Every 6 hours
-> scan recent bounded session logs
-> extract useful learnings
-> auto-promote low-risk learnings
-> write medium/high-risk items to review candidates
-> never auto-edit global rules, skills, external systems, or secrets

Risk-Graded Auto-Promotion

Agent Evolution does not treat all learnings equally.

Low Risk: Auto-Promote

Low-risk learnings can be automatically written to evolution.md.

Examples:

  • Explicit user preferences.
  • User-corrected low-risk behavior.
  • Stable local path or tool gotchas.
  • Repeated small workflow mistakes with clear fixes.

Example:

Rule:
- When installing user-managed skills, default to `~/.agents/skills` unless the user explicitly names another directory.

Medium Risk: Candidate Review

Medium-risk learnings go to evolution-candidates.md.

Examples:

  • Changes to default workflow.
  • Skill routing or trigger phrasing changes.
  • Behavior that affects several task types.
  • Inferred patterns without explicit user confirmation.

High Risk: Requires Confirmation

High-risk learnings are never auto-promoted.

Examples:

  • File deletion or overwrite behavior.
  • GitHub push, publish, sync, or repository changes.
  • Feishu/Lark, email, posting, or other external systems.
  • Credentials, tokens, cookies, secrets.
  • Automation behavior.
  • Global instruction files such as AGENTS.md.
  • Skill file edits.
  • Broad cross-agent behavior changes.

Installed Memory Files

Agent Evolution uses three memory files:

evolution.md
  Low-risk auto-promoted learnings.

evolution-candidates.md
  Medium/high-risk learnings waiting for review.

evolution-promotions.md
  Audit log for automatic promotions.

This creates a feedback loop without letting the agent rewrite high-impact rules without review.


Install

Agent Evolution is a single-skill repository with a lightweight installer.

The installer uses minimal dependencies:

  • bash
  • mkdir
  • cp
  • tar
  • curl only for one-line remote install

No database. No Docker. No browser automation. No npm install. No external API. No GitHub token.

One-Line Install

curl -fsSL https://raw.githubusercontent.com/chemny/agent-evolution/main/install.sh | bash

What The Installer Does

The installer:

  1. Installs the skill to:
~/.agents/skills/agent-evolution
  1. Creates memory templates:
evolution.md
evolution-candidates.md
evolution-promotions.md
  1. Detects supported host environments:
  • Codex
  • Claude Code
  • OpenClaw
  • Generic CLI
  1. For Codex, creates a 6-hour graded scan automation:
~/.codex/automations/agent-evolution-graded-scan/automation.toml
  1. For Claude Code, OpenClaw, and generic environments, installs adapter prompts and memory templates.

Platform Support

Platform Core Skill Memory Templates 6-Hour Background Scan Low-Risk Auto-Promotion
Codex yes yes yes, via Codex automation yes
OpenClaw yes yes if host scheduler is available yes, when scheduled
Claude Code yes yes if hooks or cron are available yes, when scheduled
Generic CLI yes yes only with AGENT_EVOLUTION_SCAN_COMMAND command-dependent

Background self-running is a host capability.

The skill provides adapters and templates, but each platform must have a way to run scheduled jobs.


Verify Install

After installation, run:

~/.agents/skills/agent-evolution/scripts/verify-install.sh

Then start a fresh agent session and test:

Use agent-evolution: remember that my writing style is direct and example-driven. Do not write files; just explain how you would handle this memory.

Expected behavior:

Path: direct memory
Validation: not required
Destination: host agent user memory

Usage Examples

Remember A Preference

Remember: my writing style is direct, practical, and avoids marketing language.

Expected handling:

Type: preference
Risk: low
Action: store as user memory

Learn From A Correction

You made the same directory-sync mistake again. Do not use the old sync logic anymore.

Expected handling:

Type: correction
Risk: low or medium depending on scope
Action: auto-promote if local and explicit; otherwise write candidate

Reflect After A Task

Summarize what should be learned from this task.

Expected output:

## Evolution Reference

- Reusable learning:
- User preference:
- Tool or environment gotcha:
- Next time avoid:
- Suggested rule update:

Handle A High-Risk Rule

From now on, automatically delete old duplicate skills.

Expected handling:

Risk: high
Action: write candidate only
Reason: deletion behavior requires confirmation

File Structure

agent-evolution/
├── SKILL.md
├── install.sh
├── README.md
├── README.zh.md
├── LICENSE
├── templates/
│   ├── evolution.md
│   ├── evolution-candidates.md
│   ├── evolution-promotions.md
│   ├── codex-automation.toml
│   └── generic-scan-prompt.md
├── adapters/
│   ├── codex.md
│   ├── claude-code.md
│   └── openclaw.md
├── scripts/
│   ├── detect-platform.sh
│   ├── install-codex.sh
│   ├── install-claude-code.sh
│   ├── install-openclaw.sh
│   ├── install-generic-cron.sh
│   ├── verify-install.sh
│   ├── log-event.mjs
│   ├── promote-rule.mjs
│   └── prune-rules.mjs
├── references/
│   ├── direct-memory.md
│   ├── eval-loop.md
│   ├── memory-layers.md
│   ├── promotion.md
│   ├── pruning.md
│   ├── reflection.md
│   ├── safety.md
│   ├── self-start.md
│   ├── storage-routing.md
│   ├── triage.md
│   ├── trigger-evolution.md
│   └── trigger-registry.md
└── evals/
    └── evals.json

Safety Boundaries

Agent Evolution will not automatically:

  • Store secrets, tokens, cookies, passwords, or private keys.
  • Delete files.
  • Overwrite existing files.
  • Push, publish, sync, or change GitHub repositories.
  • Operate external systems such as Feishu/Lark, email, social platforms, or production services.
  • Change global instruction files.
  • Edit skill files.
  • Change automation behavior.
  • Promote broad behavior changes without review.

High-impact changes are written to candidates and require confirmation.


Update

Re-run the installer:

curl -fsSL https://raw.githubusercontent.com/chemny/agent-evolution/main/install.sh | bash

Then start a fresh agent session if your host scans skills only at startup.


License

See LICENSE.

About

让 Agent 从真实项目中沉淀经验:记忆、复盘、纠错、Project Ledger 扫描报告与规则晋升。

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors