Evaluator-Governed Recursive Improvement (EGRI)
Turn ambiguous goals into safe, measurable, rollback-capable recursive improvement systems.
EGRI formalizes the pattern behind autoresearch as a reusable systems primitive:
specify mutable surface → freeze harness → propose mutation → execute under budget
→ score with trusted evaluator → promote / discard / branch → record lineage → repeat
The core insight: autoresearch is not "AI doing ML research." It is a bounded closed-loop optimizer over executable artifacts. That pattern generalizes to any domain where you have:
- A mutable artifact (code, config, prompt, workflow, parameters)
- An executable harness (run candidates repeatably)
- A trusted evaluator (score outcomes reliably)
- Bounded damage (reject, rollback, sandbox bad candidates)
A problem instance is a tuple:
Π = (X, M, H, E, J, C, B, P, L)
| Symbol | Name |
|---|---|
| X | Artifact state space |
| M | Mutation operators |
| H | Immutable harness |
| E | Execution backend |
| J | Evaluator |
| C | Hard constraints |
| B | Budget policy |
| P | Promotion policy |
| L | Ledger |
Do not grant an agent more mutation freedom than your evaluator can reliably judge.
Autoany is designed as three layers:
| Layer | Role | What it contains |
|---|---|---|
| autoany-skill | Compiler | Interprets user intent, produces problem-spec, decides runtime components |
| autoany-core | Microkernel | Loop orchestration, ledger, executor/evaluator/selector abstractions |
| problem-instance | Generated | Actual evaluator, harness, artifact space, operators, domain data |
Currently implemented: autoany-skill (the agent skill for problem compilation and scaffolding).
EGRI applies across domains:
- ML Training — train.py mutations, val_bpb evaluator (the original autoresearch)
- RAG Pipelines — retrieval config, chunking, prompts; judge-scored accuracy
- Workflow/Ops — decision graphs, routing policies; replay-evaluated completion rate
- ETL — transform logic, schema mappings; data quality evaluator
- Compiler — pass ordering, codegen flags; benchmark suite evaluator
- UI/Product — copy, layout, flows; A/B or simulator evaluator
- Prompt Engineering — system prompts, few-shot examples; judge-scored accuracy
The autoany/ directory contains a standards-aligned agent skill (skills.sh, agentskills.io) that teaches agents to:
- Compile problems — turn vague goals into formal
problem-spec.yaml - Design evaluators first — before any mutation begins
- Constrain mutation surfaces — smallest viable mutable set
- Build harnesses — immutable execution shells
- Run bounded loops — budget-enforced, ledger-tracked
- Distill strategy — learn from search history, not just individual trials
# Copy the skill directory to your Claude skills path
cp -r autoany/ ~/.claude/skills/autoany
# Or use the packaged .skill file
# (distribute autoany.skill to other agents)python3 autoany/scripts/autoany_init.py my-project --domain rag --path ./projects| Mode | Mutate | Execute | Promote | Use when |
|---|---|---|---|---|
| Suggestion | Propose only | No | No | Evaluator untrusted |
| Sandbox | Yes | Yes | No | Needs human review |
| Auto-promote | Yes | Yes | Yes | Strong evaluator |
| Portfolio | Yes | Yes | Yes | Multiple parallel loops |
| Level | What it optimizes |
|---|---|
| 0 | The artifact itself |
| 1 | The mutation/search policy |
| 2 | Budget allocation across loops |
| 3 | The organizational rules governing everything |
| Crate | Purpose |
|---|---|
| autoany-aios | Connects EGRI loops to the Arcan runtime for execution. Wraps Arcan HTTP sessions so trials run inside managed agent environments. |
| autoany-lago | Persists EGRI trial records to a Lago journal. Stores trials as EventKind::Custom entries with an "egri." prefix for easy querying. |
Both adapter crates are standalone (not part of the workspace) and have their own test suites.
| Module | Role |
|---|---|
dead_ends.rs |
Tracks explored-and-rejected mutation paths to avoid revisiting them |
stagnation.rs |
Detects when the loop stops making progress and triggers policy changes |
strategy.rs |
Distills reusable search strategies from trial history (Level 1 meta-loop) |
inheritance.rs |
Carries learned strategies and dead-end knowledge across independent runs |
- EGRI formal model and doctrine
- Agent skill with problem compilation procedure
- Domain presets (code, RAG, workflow, ETL, UI)
- Scaffold initializer
- Ledger schema
-
autoany-core— reusable loop microkernel - Dead-end tracking, stagnation detection, strategy distillation, cross-run inheritance
- Arcan runtime adapter (
autoany-aios) - Lago persistence adapter (
autoany-lago) - Evaluator abstraction + adapters
- Selector (promotion controller)
- Portfolio manager (Level 2)
MIT