Skip to content

Latest commit

 

History

History
551 lines (403 loc) · 15.9 KB

File metadata and controls

551 lines (403 loc) · 15.9 KB

AICL Evolution Story

Purpose: This document explains why AICL evolved from version to version, showing how each change preserved and strengthened the core philosophy.

For developers: Understand the reasoning behind architectural decisions. For researchers: See how theory evolved through practical implementation. For IDE assistants: Know which principles are stable vs. which details changed.


Evolution Timeline

2025-01  v0.1  Original Concept
   ↓
2025-03  v0.2  Added Gradient Information
   ↓
2025-10  v0.3  Hierarchical Control
   ↓
2025-12  v2.1  Semantic Kinematics (EKF/PID)
   ↓
2026-02  v2.2  Assistive SDK (CURRENT)

v0.1 → v0.2: Adding Gradient Information

Date: 2025-03 Status: v0.2 superseded by v0.3

What Changed

Added three new modules:

  1. Probe - Cheap feasibility checks providing directional signals
  2. BudgetTracker - Explicit resource tracking
  3. StrategySelector - Policy routing based on failure patterns

Architecture evolution:

v0.1 (4 modules):
Environment → Policy → Evaluator → Ladder

v0.2 (7 modules):
Environment → Policy → Evaluator → Ladder
     ↓          ↓          ↓
  Probe → BudgetTracker → StrategySelector

Why We Evolved

Problem 1: Insufficient Gradient Information

// v0.1: Only Ladder provided gradient
const action = policy.decide(state, ladder)
// Ladder level: 0.5 (exploration intensity)
// But: No information about feasibility or direction!

Solution: Added Probes

// v0.2: Multiple gradient sources
const probeResults = probes.map(p => p.test(state))
// Probes provide: "too-narrow", "stuck-at-zero", "drop-detected"
const action = policy.decide(state, ladder, probeResults)

Problem 2: Unbounded Exploration

// v0.1: No explicit resource limits
while (true) {
  // Could run forever!
  const action = policy.decide(state, ladder)
  state = env.apply(action)
}

Solution: Added BudgetTracker

// v0.2: Explicit limits
while (!budget.shouldStop()) {
  const action = policy.decide(state, ladder)
  state = env.apply(action)
  budget.record(cost(action))
}

Problem 3: Single Policy Limitation

// v0.1: One policy for all scenarios
const policy = new GenericPolicy()
// What if different problems need different approaches?

Solution: Added StrategySelector

// v0.2: Dynamic policy selection
const { policy } = selector.select({
  failure: classifiedFailure,
  ladderLevel: ladder.level(),
  policies: [policyA, policyB, policyC]
})

Philosophy Preserved

Gradient-Guided: Enhanced from 1 source (Ladder) to 3 sources (Ladder + Probes + Budget) ✅ Bounded: Made explicit through BudgetTracker ✅ Modular: Added components without breaking existing ones ✅ Convergence: Budget provides hard stopping criteria ✅ Hierarchical: (Not yet, but foundation laid)

Impact

  • Better exploration: Agents had directional signals, not just intensity
  • Predictable costs: Budget made resource usage explicit
  • Flexibility: StrategySelector enabled multi-domain applications

Limitations Discovered

After implementing v0.2 in the GitHub search demo, we discovered:

  1. Unclear cost boundaries - LLM calls could happen in Policy, StrategySelector, or FailureClassifier
  2. Routing overhead - StrategySelector added complexity for single-domain tasks
  3. Unpredictable LLM usage - Hard to know total cost before running
  4. Mixed concerns - Policy handled both tactical and strategic decisions

These limitations led to v0.3...


v0.2 → v0.3: Hierarchical Control

Date: 2025-10 Status: CURRENT

What Changed

Architectural shift from flat to hierarchical:

  1. Split PolicyProbePolicy (inner loop) + Planner (outer loop)
  2. Split BudgetControlBudget with inner/outer layers
  3. Clarified roles - Reflexive (cheap, frequent) vs. Strategic (expensive, infrequent)

Architecture evolution:

v0.2 (Flat):
User Input → StrategySelector → Policy → Probe → Environment
                                   ↓
                              BudgetTracker

v0.3 (Hierarchical):
User Input
   ↓
[Outer Loop - Planner] (2-3 LLM calls)
   ↓
[Inner Loop - ProbePolicy] (10-50 iterations)
   ├─ Probe (gradient signals)
   ├─ Environment (apply actions)
   ├─ Evaluator (measure progress)
   └─ Ladder (adjust intensity)
   ↓
[Outer Loop - Planner] (evaluate results)

Why We Evolved

Problem 1: Cost Boundaries Unclear

// v0.2: Where do LLM calls happen?
const { policy } = selector.select(...)  // LLM call?
const action = policy.decide(state)      // LLM call?
const failure = classifier.classify(...) // LLM call?

// Total cost: ??? (unpredictable!)

Solution: Explicit Layers

// v0.3: Clear separation
// Outer loop (expensive, 2-3 calls):
const initialState = await planner.plan(input)  // LLM call #1

// Inner loop (cheap, 10-50 iterations):
for (let t = 0; t < maxSteps; t++) {
  const probeResults = probes.map(p => p.test(state))  // 0.05 units
  const action = probePolicy.decide(state, ladder)     // 0.1 units (no LLM!)
  state = env.apply(action)
}

// Back to outer loop:
const output = await planner.evaluate(state)  // LLM call #2

// Total cost: 2-3 LLM calls + (10-50 × 0.15) = 5-8 units (predictable!)

Problem 2: Mixed Concerns

// v0.2: Policy did everything
class Policy {
  async decide(state, ladder) {
    // Strategic reasoning (expensive)
    const strategy = await this.llm.plan(state)

    // Tactical adjustment (cheap)
    const action = this.adjustFilters(state, strategy)

    // Mixed concerns!
  }
}

Solution: Separate Responsibilities

// v0.3: Clear separation
class Planner {
  async plan(userInput) {
    // Strategic: Create initial exploration strategy
    return await this.llm.generatePlan(userInput)
  }
}

class ProbePolicy {
  decide(state, ladder) {
    // Tactical: Quick adjustments using gradients
    if (state.hits > 30) return this.narrow(state.filters)
    if (state.hits < 10) return this.broaden(state.filters)
    return { type: 'done' }
  }
}

Problem 3: Routing Overhead

// v0.2: StrategySelector for every decision
for (let t = 0; t < maxSteps; t++) {
  const { policy } = selector.select(...)  // Overhead!
  const action = policy.decide(state)
}

Solution: Single ProbePolicy for Inner Loop

// v0.3: One policy, many iterations
const probePolicy = new DeterministicSearchPolicy()
probePolicy.initialize(initialState)

for (let t = 0; t < maxSteps; t++) {
  const action = probePolicy.decide(state, ladder)  // No routing!
  if (probePolicy.isStable(state)) break
}

Philosophy Preserved (and Strengthened!)

Gradient-Guided: Now multi-dimensional (Ladder + Probes + History) ✅ Hierarchical: Made explicit through inner/outer loops ✅ Modular: Enhanced - ProbePolicy and Planner are independently replaceable ✅ Bounded: Dual-layer budgets (inner + outer) ✅ Convergence: Explicit through isStable() + budget

Impact

Benchmark Results (GitHub Search):

Metric v0.2 (Flat) v0.3 (Hierarchical) Change
LLM Calls 2-10 (unpredictable) 2-3 (predictable) Predictable
API Calls 2 5 ⚠️ More exploration
Repos Found 3 10 3.3x better
Success Rate 60% 85% +25%
Total Cost 4-12 units 5-8 units Predictable
Duration 15s 25s ⚠️ Slower (but thorough)

Key Insights:

  • Same LLM cost, better coverage through systematic exploration
  • Predictable resource usage enables autonomous operation
  • Deterministic inner loop enables reproducibility

What Became Optional

In v0.3, these components moved to "optional meta-control":

  • StrategySelector - Only needed for multi-domain scenarios
  • FailureClassifier - Only needed for complex failure modes
  • TerminationPolicy - Only needed for multi-objective optimization

Why: Most applications work fine with single ProbePolicy + Planner. Advanced scenarios can add these when needed.


Key Lessons Learned

Lesson 1: Start Simple, Add Complexity When Needed

v0.1 → v0.2: Added gradient information when single Ladder proved insufficient v0.2 → v0.3: Added hierarchy when flat architecture showed cost issues

Principle: Don't over-engineer upfront. Let real problems guide evolution.

Lesson 2: Preserve Philosophy, Evolve Implementation

What stayed constant:

  • Gradient-guided exploration
  • Modular separation of concerns
  • Bounded sustainability
  • Convergence through stability

What changed:

  • Number of modules (4 → 7 → 8)
  • Architecture (flat → hierarchical)
  • Specific interfaces

Principle: Core philosophy is timeless. Implementation adapts to reality.

Lesson 3: Explicit is Better Than Implicit

v0.1: Implicit resource limits → v0.2: Explicit BudgetTracker v0.2: Unclear LLM usage → v0.3: Explicit inner/outer loops

Principle: Make costs, boundaries, and responsibilities explicit.

Lesson 4: Benchmark Early, Iterate Often

v0.2 limitations discovered through GitHub search implementation v0.3 design validated through comparative benchmarks

Principle: Theory guides design, but practice reveals truth.


v2.1 → v2.2: The Assistive SDK

Date: 2026-02 Status: CURRENT

What Changed

Architectural shift from framework-first to agent-first:

  1. New primary APIcyberloop(agent, opts) wraps any agent with control
  2. Middleware system — Composable beforeStep/afterStep hooks replace monolithic Orchestrator wiring
  3. Agent protocolAgentLike (opaque) and SteppableAgent (step-level) interfaces
  4. Built-in middlewarebudgetMiddleware, telemetryMiddleware, stagnationMiddleware, probeMiddleware, evaluatorMiddleware, policyMiddleware
  5. Advanced middlewarekinematicsMiddleware wraps PhysicsEngine + PIDController from v2.1
  6. Backward compatibleOrchestrator and all v2.1 components preserved

Architecture evolution:

v2.1 (Framework-first):
User Code
   ↓
Orchestrator (coordinates everything)
   ├─ ProbePolicy / KinematicProbePolicy
   ├─ Planner
   ├─ Probes, Evaluator, Ladder
   └─ ControlBudget

v2.2 (Agent-first):
User Code
   ↓
cyberloop(agent, { middleware: [...] })
   ├─ MiddlewareRunner (beforeStep / afterStep)
   │    ├─ budgetMiddleware (auto)
   │    ├─ policyMiddleware (guards + reflexes + base policy)
   │    ├─ kinematicsMiddleware (EKF/PID)
   │    └─ telemetryMiddleware
   └─ SteppableAgent.step() (user-defined)

Why We Evolved

Problem 1: High adoption barrier

// v2.1: User must learn 7+ interfaces to get started
const orchestrator = new Orchestrator({
  env, evaluator, ladder, budget, selector, probes,
  policies: [new KinematicProbePolicy(embedder, engine, pid)],
})
// Steep learning curve for simple use cases

Solution: Progressive disclosure

// v2.2 Tier 1: Wrap any agent in one line
const controlled = cyberloop(myAgent, { budget: { maxSteps: 20 } })

// v2.2 Tier 2: Opt into step-level middleware when ready
const controlled = cyberloop(mySteppableAgent, {
  middleware: [telemetryMiddleware(logger)],
})

Problem 2: Monolithic control wiring

// v2.1: All control logic hardwired in Orchestrator.run()
// Adding a new concern (e.g., stagnation detection) requires
// modifying the Orchestrator or creating a new one

Solution: Composable middleware

// v2.2: Each concern is an independent middleware
const controlled = cyberloop(agent, {
  middleware: [
    stagnationMiddleware({ maxStagnantSteps: 5 }),
    telemetryMiddleware(logger),
    kinematicsMiddleware({ embedder, goalEmbedding, ... }),
  ],
})
// Add/remove/reorder without touching framework internals

Problem 3: Policy stack wiring exposed to users

// v2.1: User manually constructs ChainPolicy
const chain = new ChainPolicy(basePolicy, [guard1, guard2], [reflex1])
const action = await chain.decide(state, ladder)

Solution: policyMiddleware

// v2.2: Declarative policy configuration
const { middleware, decideAction } = policyMiddleware({
  basePolicy, guards: [guard1, guard2], reflexes: [reflex1], ladder,
})
// decideAction(state) inside step(), middleware handles lifecycle

Philosophy Preserved (and Strengthened!)

Gradient-Guided: Middleware provides composable gradient sources (probes, evaluators, kinematics) ✅ Hierarchical: Three tiers (opaque → steppable → advanced) mirror inner/outer loop separation ✅ Modular: Middleware is the ultimate modular separation — each concern is a plug-in ✅ Bounded: budgetMiddleware auto-registered by default; hard limits always enforced ✅ Convergence: isDone() + budget halting provide explicit stopping criteria

Impact

  • 337 tests across 28 files, all passing
  • Zero breaking changes to existing Orchestrator API
  • Three new standalone examples demonstrating progressive adoption
  • Six revised examples (GitHub + Wikipedia) showing migration path

Future Evolution (Speculation)

Potential v2.3 Enhancements

Candidates for addition:

  • Beam search middleware - Parallel candidate exploration
  • Query memoization middleware - Avoid redundant exploration
  • Adaptive threshold middleware - Learn stability criteria dynamically
  • Multi-objective evaluation - Balance multiple goals
  • Outer loop middleware - Planner integration via middleware chain

What will NOT change:

  • The five core pillars (see PHILOSOPHY.md)
  • Hierarchical inner/outer architecture
  • Explicit cost control
  • Modular interfaces
  • Backward compatibility with Orchestrator API

Evolution Principles Going Forward

  1. Preserve core philosophy - Five pillars are immutable
  2. Validate through benchmarks - Theory must meet practice
  3. Document reasoning - Update this file with each version
  4. Maintain backward compatibility - When possible, provide migration paths
  5. Keep it simple - Add complexity only when justified by real problems

For Contributors

When Proposing Changes

Ask yourself:

  1. Does this preserve the five core pillars?

    • Gradient-guided, hierarchical, modular, bounded, convergent
  2. Does this solve a real problem?

    • Not just theoretical elegance, but practical pain points
  3. Is this the simplest solution?

    • Can we achieve the goal with less complexity?
  4. Can we benchmark the improvement?

    • How will we measure success?
  5. Does this maintain backward compatibility?

    • If not, is the breaking change justified?

When Reviewing Evolution

  1. Read PHILOSOPHY.md - Understand what must not change
  2. Read this document - Understand why things changed
  3. Read current AICL.md - Understand current state
  4. Check ADRs - See detailed decision records

Conclusion

AICL has evolved from a simple 4-module feedback loop to a sophisticated middleware-based SDK. Through each evolution:

Core philosophy preserved - Five pillars remain constant ✅ Practical problems solved - Each change addressed real limitations ✅ Complexity justified - Added only when simpler approaches failed ✅ Benchmarks validated - Theory met practice successfully

The framework will continue to evolve, but always guided by the timeless principles in PHILOSOPHY.md.

"Evolution is not about changing what we are, but about becoming more fully what we always were."


Last Updated: 2026-02-14 Next Review: When v2.3 is proposed Maintained by: CyberLoop Project License: Apache-2.0