Autoany

Evaluator-Governed Recursive Improvement (EGRI)

Turn ambiguous goals into safe, measurable, rollback-capable recursive improvement systems.

What is EGRI?

EGRI formalizes the pattern behind autoresearch as a reusable systems primitive:

specify mutable surface → freeze harness → propose mutation → execute under budget
→ score with trusted evaluator → promote / discard / branch → record lineage → repeat

The core insight: autoresearch is not "AI doing ML research." It is a bounded closed-loop optimizer over executable artifacts. That pattern generalizes to any domain where you have:

A mutable artifact (code, config, prompt, workflow, parameters)
An executable harness (run candidates repeatably)
A trusted evaluator (score outcomes reliably)
Bounded damage (reject, rollback, sandbox bad candidates)

Formal Model

A problem instance is a tuple:

Π = (X, M, H, E, J, C, B, P, L)

Symbol	Name
X	Artifact state space
M	Mutation operators
H	Immutable harness
E	Execution backend
J	Evaluator
C	Hard constraints
B	Budget policy
P	Promotion policy
L	Ledger

Core Law

Do not grant an agent more mutation freedom than your evaluator can reliably judge.

Architecture

Autoany is designed as three layers:

Layer	Role	What it contains
autoany-skill	Compiler	Interprets user intent, produces problem-spec, decides runtime components
autoany-core	Microkernel	Loop orchestration, ledger, executor/evaluator/selector abstractions
problem-instance	Generated	Actual evaluator, harness, artifact space, operators, domain data

Currently implemented: autoany-skill (the agent skill for problem compilation and scaffolding).

Domain Mappings

EGRI applies across domains:

ML Training — train.py mutations, val_bpb evaluator (the original autoresearch)
RAG Pipelines — retrieval config, chunking, prompts; judge-scored accuracy
Workflow/Ops — decision graphs, routing policies; replay-evaluated completion rate
ETL — transform logic, schema mappings; data quality evaluator
Compiler — pass ordering, codegen flags; benchmark suite evaluator
UI/Product — copy, layout, flows; A/B or simulator evaluator
Prompt Engineering — system prompts, few-shot examples; judge-scored accuracy

Agent Skill

The autoany/ directory contains a standards-aligned agent skill (skills.sh, agentskills.io) that teaches agents to:

Compile problems — turn vague goals into formal problem-spec.yaml
Design evaluators first — before any mutation begins
Constrain mutation surfaces — smallest viable mutable set
Build harnesses — immutable execution shells
Run bounded loops — budget-enforced, ledger-tracked
Distill strategy — learn from search history, not just individual trials

Install the skill

# Copy the skill directory to your Claude skills path
cp -r autoany/ ~/.claude/skills/autoany

# Or use the packaged .skill file
# (distribute autoany.skill to other agents)

Scaffold a new project

python3 autoany/scripts/autoany_init.py my-project --domain rag --path ./projects

Autonomy Modes

Mode	Mutate	Execute	Promote	Use when
Suggestion	Propose only	No	No	Evaluator untrusted
Sandbox	Yes	Yes	No	Needs human review
Auto-promote	Yes	Yes	Yes	Strong evaluator
Portfolio	Yes	Yes	Yes	Multiple parallel loops

Nested Loops

Level	What it optimizes
0	The artifact itself
1	The mutation/search policy
2	Budget allocation across loops
3	The organizational rules governing everything

Adapter Crates

Crate	Purpose
autoany-aios	Connects EGRI loops to the Arcan runtime for execution. Wraps Arcan HTTP sessions so trials run inside managed agent environments.
autoany-lago	Persists EGRI trial records to a Lago journal. Stores trials as `EventKind::Custom` entries with an `"egri."` prefix for easy querying.

Both adapter crates are standalone (not part of the workspace) and have their own test suites.

New autoany-core Modules

Module	Role
`dead_ends.rs`	Tracks explored-and-rejected mutation paths to avoid revisiting them
`stagnation.rs`	Detects when the loop stops making progress and triggers policy changes
`strategy.rs`	Distills reusable search strategies from trial history (Level 1 meta-loop)
`inheritance.rs`	Carries learned strategies and dead-end knowledge across independent runs

Roadmap

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
autoany-aios		autoany-aios
autoany-core		autoany-core
autoany-lago		autoany-lago
autoany		autoany
examples/ticket-classifier		examples/ticket-classifier
scripts		scripts
skills		skills
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SKILL.md		SKILL.md
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autoany

What is EGRI?

Formal Model

Core Law

Architecture

Domain Mappings

Agent Skill

Install the skill

Scaffold a new project

Autonomy Modes

Nested Loops

Adapter Crates

New autoany-core Modules

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autoany

What is EGRI?

Formal Model

Core Law

Architecture

Domain Mappings

Agent Skill

Install the skill

Scaffold a new project

Autonomy Modes

Nested Loops

Adapter Crates

New autoany-core Modules

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages