AICL Evolution Story

Purpose: This document explains why AICL evolved from version to version, showing how each change preserved and strengthened the core philosophy.

For developers: Understand the reasoning behind architectural decisions. For researchers: See how theory evolved through practical implementation. For IDE assistants: Know which principles are stable vs. which details changed.

Evolution Timeline

2025-01  v0.1  Original Concept
   ↓
2025-03  v0.2  Added Gradient Information
   ↓
2025-10  v0.3  Hierarchical Control
   ↓
2025-12  v2.1  Semantic Kinematics (EKF/PID)
   ↓
2026-02  v2.2  Assistive SDK (CURRENT)

v0.1 → v0.2: Adding Gradient Information

Date: 2025-03 Status: v0.2 superseded by v0.3

What Changed

Added three new modules:

Probe - Cheap feasibility checks providing directional signals
BudgetTracker - Explicit resource tracking
StrategySelector - Policy routing based on failure patterns

Architecture evolution:

v0.1 (4 modules):
Environment → Policy → Evaluator → Ladder

v0.2 (7 modules):
Environment → Policy → Evaluator → Ladder
     ↓          ↓          ↓
  Probe → BudgetTracker → StrategySelector

Why We Evolved

Problem 1: Insufficient Gradient Information

// v0.1: Only Ladder provided gradient
const action = policy.decide(state, ladder)
// Ladder level: 0.5 (exploration intensity)
// But: No information about feasibility or direction!

Solution: Added Probes

// v0.2: Multiple gradient sources
const probeResults = probes.map(p => p.test(state))
// Probes provide: "too-narrow", "stuck-at-zero", "drop-detected"
const action = policy.decide(state, ladder, probeResults)

Problem 2: Unbounded Exploration

// v0.1: No explicit resource limits
while (true) {
  // Could run forever!
  const action = policy.decide(state, ladder)
  state = env.apply(action)
}

Solution: Added BudgetTracker

// v0.2: Explicit limits
while (!budget.shouldStop()) {
  const action = policy.decide(state, ladder)
  state = env.apply(action)
  budget.record(cost(action))
}

Problem 3: Single Policy Limitation

// v0.1: One policy for all scenarios
const policy = new GenericPolicy()
// What if different problems need different approaches?

Solution: Added StrategySelector

// v0.2: Dynamic policy selection
const { policy } = selector.select({
  failure: classifiedFailure,
  ladderLevel: ladder.level(),
  policies: [policyA, policyB, policyC]
})

Philosophy Preserved

✅ Gradient-Guided: Enhanced from 1 source (Ladder) to 3 sources (Ladder + Probes + Budget) ✅ Bounded: Made explicit through BudgetTracker ✅ Modular: Added components without breaking existing ones ✅ Convergence: Budget provides hard stopping criteria ✅ Hierarchical: (Not yet, but foundation laid)

Impact

Better exploration: Agents had directional signals, not just intensity
Predictable costs: Budget made resource usage explicit
Flexibility: StrategySelector enabled multi-domain applications

Limitations Discovered

After implementing v0.2 in the GitHub search demo, we discovered:

Unclear cost boundaries - LLM calls could happen in Policy, StrategySelector, or FailureClassifier
Routing overhead - StrategySelector added complexity for single-domain tasks
Unpredictable LLM usage - Hard to know total cost before running
Mixed concerns - Policy handled both tactical and strategic decisions

These limitations led to v0.3...

v0.2 → v0.3: Hierarchical Control

Date: 2025-10 Status: CURRENT

What Changed

Architectural shift from flat to hierarchical:

Split Policy → ProbePolicy (inner loop) + Planner (outer loop)
Split Budget → ControlBudget with inner/outer layers
Clarified roles - Reflexive (cheap, frequent) vs. Strategic (expensive, infrequent)

Architecture evolution:

v0.2 (Flat):
User Input → StrategySelector → Policy → Probe → Environment
                                   ↓
                              BudgetTracker

v0.3 (Hierarchical):
User Input
   ↓
[Outer Loop - Planner] (2-3 LLM calls)
   ↓
[Inner Loop - ProbePolicy] (10-50 iterations)
   ├─ Probe (gradient signals)
   ├─ Environment (apply actions)
   ├─ Evaluator (measure progress)
   └─ Ladder (adjust intensity)
   ↓
[Outer Loop - Planner] (evaluate results)

Why We Evolved

Problem 1: Cost Boundaries Unclear

// v0.2: Where do LLM calls happen?
const { policy } = selector.select(...)  // LLM call?
const action = policy.decide(state)      // LLM call?
const failure = classifier.classify(...) // LLM call?

// Total cost: ??? (unpredictable!)

Solution: Explicit Layers

// v0.3: Clear separation
// Outer loop (expensive, 2-3 calls):
const initialState = await planner.plan(input)  // LLM call #1

// Inner loop (cheap, 10-50 iterations):
for (let t = 0; t < maxSteps; t++) {
  const probeResults = probes.map(p => p.test(state))  // 0.05 units
  const action = probePolicy.decide(state, ladder)     // 0.1 units (no LLM!)
  state = env.apply(action)
}

// Back to outer loop:
const output = await planner.evaluate(state)  // LLM call #2

// Total cost: 2-3 LLM calls + (10-50 × 0.15) = 5-8 units (predictable!)

Problem 2: Mixed Concerns

// v0.2: Policy did everything
class Policy {
  async decide(state, ladder) {
    // Strategic reasoning (expensive)
    const strategy = await this.llm.plan(state)

    // Tactical adjustment (cheap)
    const action = this.adjustFilters(state, strategy)

    // Mixed concerns!
  }
}

Solution: Separate Responsibilities

// v0.3: Clear separation
class Planner {
  async plan(userInput) {
    // Strategic: Create initial exploration strategy
    return await this.llm.generatePlan(userInput)
  }
}

class ProbePolicy {
  decide(state, ladder) {
    // Tactical: Quick adjustments using gradients
    if (state.hits > 30) return this.narrow(state.filters)
    if (state.hits < 10) return this.broaden(state.filters)
    return { type: 'done' }
  }
}

Problem 3: Routing Overhead

// v0.2: StrategySelector for every decision
for (let t = 0; t < maxSteps; t++) {
  const { policy } = selector.select(...)  // Overhead!
  const action = policy.decide(state)
}

Solution: Single ProbePolicy for Inner Loop

// v0.3: One policy, many iterations
const probePolicy = new DeterministicSearchPolicy()
probePolicy.initialize(initialState)

for (let t = 0; t < maxSteps; t++) {
  const action = probePolicy.decide(state, ladder)  // No routing!
  if (probePolicy.isStable(state)) break
}

Philosophy Preserved (and Strengthened!)

✅ Gradient-Guided: Now multi-dimensional (Ladder + Probes + History) ✅ Hierarchical: Made explicit through inner/outer loops ✅ Modular: Enhanced - ProbePolicy and Planner are independently replaceable ✅ Bounded: Dual-layer budgets (inner + outer) ✅ Convergence: Explicit through isStable() + budget

Impact

Benchmark Results (GitHub Search):

Metric	v0.2 (Flat)	v0.3 (Hierarchical)	Change
LLM Calls	2-10 (unpredictable)	2-3 (predictable)	✅ Predictable
API Calls	2	5	⚠️ More exploration
Repos Found	3	10	✅ 3.3x better
Success Rate	60%	85%	✅ +25%
Total Cost	4-12 units	5-8 units	✅ Predictable
Duration	15s	25s	⚠️ Slower (but thorough)

Key Insights:

Same LLM cost, better coverage through systematic exploration
Predictable resource usage enables autonomous operation
Deterministic inner loop enables reproducibility

What Became Optional

In v0.3, these components moved to "optional meta-control":

StrategySelector - Only needed for multi-domain scenarios
FailureClassifier - Only needed for complex failure modes
TerminationPolicy - Only needed for multi-objective optimization

Why: Most applications work fine with single ProbePolicy + Planner. Advanced scenarios can add these when needed.

Key Lessons Learned

Lesson 1: Start Simple, Add Complexity When Needed

v0.1 → v0.2: Added gradient information when single Ladder proved insufficient v0.2 → v0.3: Added hierarchy when flat architecture showed cost issues

Principle: Don't over-engineer upfront. Let real problems guide evolution.

Lesson 2: Preserve Philosophy, Evolve Implementation

What stayed constant:

Gradient-guided exploration
Modular separation of concerns
Bounded sustainability
Convergence through stability

What changed:

Number of modules (4 → 7 → 8)
Architecture (flat → hierarchical)
Specific interfaces

Principle: Core philosophy is timeless. Implementation adapts to reality.

Lesson 3: Explicit is Better Than Implicit

v0.1: Implicit resource limits → v0.2: Explicit BudgetTracker v0.2: Unclear LLM usage → v0.3: Explicit inner/outer loops

Principle: Make costs, boundaries, and responsibilities explicit.

Lesson 4: Benchmark Early, Iterate Often

v0.2 limitations discovered through GitHub search implementation v0.3 design validated through comparative benchmarks

Principle: Theory guides design, but practice reveals truth.

v2.1 → v2.2: The Assistive SDK

Date: 2026-02 Status: CURRENT

What Changed

Architectural shift from framework-first to agent-first:

New primary API — cyberloop(agent, opts) wraps any agent with control
Middleware system — Composable beforeStep/afterStep hooks replace monolithic Orchestrator wiring
Agent protocol — AgentLike (opaque) and SteppableAgent (step-level) interfaces
Built-in middleware — budgetMiddleware, telemetryMiddleware, stagnationMiddleware, probeMiddleware, evaluatorMiddleware, policyMiddleware
Advanced middleware — kinematicsMiddleware wraps PhysicsEngine + PIDController from v2.1
Backward compatible — Orchestrator and all v2.1 components preserved

Architecture evolution:

v2.1 (Framework-first):
User Code
   ↓
Orchestrator (coordinates everything)
   ├─ ProbePolicy / KinematicProbePolicy
   ├─ Planner
   ├─ Probes, Evaluator, Ladder
   └─ ControlBudget

v2.2 (Agent-first):
User Code
   ↓
cyberloop(agent, { middleware: [...] })
   ├─ MiddlewareRunner (beforeStep / afterStep)
   │    ├─ budgetMiddleware (auto)
   │    ├─ policyMiddleware (guards + reflexes + base policy)
   │    ├─ kinematicsMiddleware (EKF/PID)
   │    └─ telemetryMiddleware
   └─ SteppableAgent.step() (user-defined)

Why We Evolved

Problem 1: High adoption barrier

// v2.1: User must learn 7+ interfaces to get started
const orchestrator = new Orchestrator({
  env, evaluator, ladder, budget, selector, probes,
  policies: [new KinematicProbePolicy(embedder, engine, pid)],
})
// Steep learning curve for simple use cases

Solution: Progressive disclosure

// v2.2 Tier 1: Wrap any agent in one line
const controlled = cyberloop(myAgent, { budget: { maxSteps: 20 } })

// v2.2 Tier 2: Opt into step-level middleware when ready
const controlled = cyberloop(mySteppableAgent, {
  middleware: [telemetryMiddleware(logger)],
})

Problem 2: Monolithic control wiring

// v2.1: All control logic hardwired in Orchestrator.run()
// Adding a new concern (e.g., stagnation detection) requires
// modifying the Orchestrator or creating a new one

Solution: Composable middleware

// v2.2: Each concern is an independent middleware
const controlled = cyberloop(agent, {
  middleware: [
    stagnationMiddleware({ maxStagnantSteps: 5 }),
    telemetryMiddleware(logger),
    kinematicsMiddleware({ embedder, goalEmbedding, ... }),
  ],
})
// Add/remove/reorder without touching framework internals

Problem 3: Policy stack wiring exposed to users

// v2.1: User manually constructs ChainPolicy
const chain = new ChainPolicy(basePolicy, [guard1, guard2], [reflex1])
const action = await chain.decide(state, ladder)

Solution: policyMiddleware

// v2.2: Declarative policy configuration
const { middleware, decideAction } = policyMiddleware({
  basePolicy, guards: [guard1, guard2], reflexes: [reflex1], ladder,
})
// decideAction(state) inside step(), middleware handles lifecycle

Philosophy Preserved (and Strengthened!)

✅ Gradient-Guided: Middleware provides composable gradient sources (probes, evaluators, kinematics) ✅ Hierarchical: Three tiers (opaque → steppable → advanced) mirror inner/outer loop separation ✅ Modular: Middleware is the ultimate modular separation — each concern is a plug-in ✅ Bounded: budgetMiddleware auto-registered by default; hard limits always enforced ✅ Convergence: isDone() + budget halting provide explicit stopping criteria

Impact

337 tests across 28 files, all passing
Zero breaking changes to existing Orchestrator API
Three new standalone examples demonstrating progressive adoption
Six revised examples (GitHub + Wikipedia) showing migration path

Future Evolution (Speculation)

Potential v2.3 Enhancements

Candidates for addition:

Beam search middleware - Parallel candidate exploration
Query memoization middleware - Avoid redundant exploration
Adaptive threshold middleware - Learn stability criteria dynamically
Multi-objective evaluation - Balance multiple goals
Outer loop middleware - Planner integration via middleware chain

What will NOT change:

The five core pillars (see PHILOSOPHY.md)
Hierarchical inner/outer architecture
Explicit cost control
Modular interfaces
Backward compatibility with Orchestrator API

Evolution Principles Going Forward

Preserve core philosophy - Five pillars are immutable
Validate through benchmarks - Theory must meet practice
Document reasoning - Update this file with each version
Maintain backward compatibility - When possible, provide migration paths
Keep it simple - Add complexity only when justified by real problems

For Contributors

When Proposing Changes

Ask yourself:

Does this preserve the five core pillars?
- Gradient-guided, hierarchical, modular, bounded, convergent
Does this solve a real problem?
- Not just theoretical elegance, but practical pain points
Is this the simplest solution?
- Can we achieve the goal with less complexity?
Can we benchmark the improvement?
- How will we measure success?
Does this maintain backward compatibility?
- If not, is the breaking change justified?

When Reviewing Evolution

Read PHILOSOPHY.md - Understand what must not change
Read this document - Understand why things changed
Read current AICL.md - Understand current state
Check ADRs - See detailed decision records

Conclusion

AICL has evolved from a simple 4-module feedback loop to a sophisticated middleware-based SDK. Through each evolution:

✅ Core philosophy preserved - Five pillars remain constant ✅ Practical problems solved - Each change addressed real limitations ✅ Complexity justified - Added only when simpler approaches failed ✅ Benchmarks validated - Theory met practice successfully

The framework will continue to evolve, but always guided by the timeless principles in PHILOSOPHY.md.

"Evolution is not about changing what we are, but about becoming more fully what we always were."

Last Updated: 2026-02-14 Next Review: When v2.3 is proposed Maintained by: CyberLoop Project License: Apache-2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AICL Evolution Story

Evolution Timeline

v0.1 → v0.2: Adding Gradient Information

What Changed

Why We Evolved

Philosophy Preserved

Impact

Limitations Discovered

v0.2 → v0.3: Hierarchical Control

What Changed

Why We Evolved

Philosophy Preserved (and Strengthened!)

Impact

What Became Optional

Key Lessons Learned

Lesson 1: Start Simple, Add Complexity When Needed

Lesson 2: Preserve Philosophy, Evolve Implementation

Lesson 3: Explicit is Better Than Implicit

Lesson 4: Benchmark Early, Iterate Often

v2.1 → v2.2: The Assistive SDK

What Changed

Why We Evolved

Philosophy Preserved (and Strengthened!)

Impact

Future Evolution (Speculation)

Potential v2.3 Enhancements

Evolution Principles Going Forward

For Contributors

When Proposing Changes

When Reviewing Evolution

Conclusion

FilesExpand file tree

EVOLUTION.md

Latest commit

History

EVOLUTION.md

File metadata and controls

AICL Evolution Story

Evolution Timeline

v0.1 → v0.2: Adding Gradient Information

What Changed

Why We Evolved

Philosophy Preserved

Impact

Limitations Discovered

v0.2 → v0.3: Hierarchical Control

What Changed

Why We Evolved

Philosophy Preserved (and Strengthened!)

Impact

What Became Optional

Key Lessons Learned

Lesson 1: Start Simple, Add Complexity When Needed

Lesson 2: Preserve Philosophy, Evolve Implementation

Lesson 3: Explicit is Better Than Implicit

Lesson 4: Benchmark Early, Iterate Often

v2.1 → v2.2: The Assistive SDK

What Changed

Why We Evolved

Philosophy Preserved (and Strengthened!)

Impact

Future Evolution (Speculation)

Potential v2.3 Enhancements

Evolution Principles Going Forward

For Contributors

When Proposing Changes

When Reviewing Evolution

Conclusion