Purpose: This document explains why AICL evolved from version to version, showing how each change preserved and strengthened the core philosophy.
For developers: Understand the reasoning behind architectural decisions. For researchers: See how theory evolved through practical implementation. For IDE assistants: Know which principles are stable vs. which details changed.
2025-01 v0.1 Original Concept
↓
2025-03 v0.2 Added Gradient Information
↓
2025-10 v0.3 Hierarchical Control
↓
2025-12 v2.1 Semantic Kinematics (EKF/PID)
↓
2026-02 v2.2 Assistive SDK (CURRENT)
Date: 2025-03 Status: v0.2 superseded by v0.3
Added three new modules:
- Probe - Cheap feasibility checks providing directional signals
- BudgetTracker - Explicit resource tracking
- StrategySelector - Policy routing based on failure patterns
Architecture evolution:
v0.1 (4 modules):
Environment → Policy → Evaluator → Ladder
v0.2 (7 modules):
Environment → Policy → Evaluator → Ladder
↓ ↓ ↓
Probe → BudgetTracker → StrategySelector
Problem 1: Insufficient Gradient Information
// v0.1: Only Ladder provided gradient
const action = policy.decide(state, ladder)
// Ladder level: 0.5 (exploration intensity)
// But: No information about feasibility or direction!Solution: Added Probes
// v0.2: Multiple gradient sources
const probeResults = probes.map(p => p.test(state))
// Probes provide: "too-narrow", "stuck-at-zero", "drop-detected"
const action = policy.decide(state, ladder, probeResults)Problem 2: Unbounded Exploration
// v0.1: No explicit resource limits
while (true) {
// Could run forever!
const action = policy.decide(state, ladder)
state = env.apply(action)
}Solution: Added BudgetTracker
// v0.2: Explicit limits
while (!budget.shouldStop()) {
const action = policy.decide(state, ladder)
state = env.apply(action)
budget.record(cost(action))
}Problem 3: Single Policy Limitation
// v0.1: One policy for all scenarios
const policy = new GenericPolicy()
// What if different problems need different approaches?Solution: Added StrategySelector
// v0.2: Dynamic policy selection
const { policy } = selector.select({
failure: classifiedFailure,
ladderLevel: ladder.level(),
policies: [policyA, policyB, policyC]
})✅ Gradient-Guided: Enhanced from 1 source (Ladder) to 3 sources (Ladder + Probes + Budget) ✅ Bounded: Made explicit through BudgetTracker ✅ Modular: Added components without breaking existing ones ✅ Convergence: Budget provides hard stopping criteria ✅ Hierarchical: (Not yet, but foundation laid)
- Better exploration: Agents had directional signals, not just intensity
- Predictable costs: Budget made resource usage explicit
- Flexibility: StrategySelector enabled multi-domain applications
After implementing v0.2 in the GitHub search demo, we discovered:
- Unclear cost boundaries - LLM calls could happen in Policy, StrategySelector, or FailureClassifier
- Routing overhead - StrategySelector added complexity for single-domain tasks
- Unpredictable LLM usage - Hard to know total cost before running
- Mixed concerns - Policy handled both tactical and strategic decisions
These limitations led to v0.3...
Date: 2025-10 Status: CURRENT
Architectural shift from flat to hierarchical:
- Split Policy → ProbePolicy (inner loop) + Planner (outer loop)
- Split Budget → ControlBudget with inner/outer layers
- Clarified roles - Reflexive (cheap, frequent) vs. Strategic (expensive, infrequent)
Architecture evolution:
v0.2 (Flat):
User Input → StrategySelector → Policy → Probe → Environment
↓
BudgetTracker
v0.3 (Hierarchical):
User Input
↓
[Outer Loop - Planner] (2-3 LLM calls)
↓
[Inner Loop - ProbePolicy] (10-50 iterations)
├─ Probe (gradient signals)
├─ Environment (apply actions)
├─ Evaluator (measure progress)
└─ Ladder (adjust intensity)
↓
[Outer Loop - Planner] (evaluate results)
Problem 1: Cost Boundaries Unclear
// v0.2: Where do LLM calls happen?
const { policy } = selector.select(...) // LLM call?
const action = policy.decide(state) // LLM call?
const failure = classifier.classify(...) // LLM call?
// Total cost: ??? (unpredictable!)Solution: Explicit Layers
// v0.3: Clear separation
// Outer loop (expensive, 2-3 calls):
const initialState = await planner.plan(input) // LLM call #1
// Inner loop (cheap, 10-50 iterations):
for (let t = 0; t < maxSteps; t++) {
const probeResults = probes.map(p => p.test(state)) // 0.05 units
const action = probePolicy.decide(state, ladder) // 0.1 units (no LLM!)
state = env.apply(action)
}
// Back to outer loop:
const output = await planner.evaluate(state) // LLM call #2
// Total cost: 2-3 LLM calls + (10-50 × 0.15) = 5-8 units (predictable!)Problem 2: Mixed Concerns
// v0.2: Policy did everything
class Policy {
async decide(state, ladder) {
// Strategic reasoning (expensive)
const strategy = await this.llm.plan(state)
// Tactical adjustment (cheap)
const action = this.adjustFilters(state, strategy)
// Mixed concerns!
}
}Solution: Separate Responsibilities
// v0.3: Clear separation
class Planner {
async plan(userInput) {
// Strategic: Create initial exploration strategy
return await this.llm.generatePlan(userInput)
}
}
class ProbePolicy {
decide(state, ladder) {
// Tactical: Quick adjustments using gradients
if (state.hits > 30) return this.narrow(state.filters)
if (state.hits < 10) return this.broaden(state.filters)
return { type: 'done' }
}
}Problem 3: Routing Overhead
// v0.2: StrategySelector for every decision
for (let t = 0; t < maxSteps; t++) {
const { policy } = selector.select(...) // Overhead!
const action = policy.decide(state)
}Solution: Single ProbePolicy for Inner Loop
// v0.3: One policy, many iterations
const probePolicy = new DeterministicSearchPolicy()
probePolicy.initialize(initialState)
for (let t = 0; t < maxSteps; t++) {
const action = probePolicy.decide(state, ladder) // No routing!
if (probePolicy.isStable(state)) break
}✅ Gradient-Guided: Now multi-dimensional (Ladder + Probes + History)
✅ Hierarchical: Made explicit through inner/outer loops
✅ Modular: Enhanced - ProbePolicy and Planner are independently replaceable
✅ Bounded: Dual-layer budgets (inner + outer)
✅ Convergence: Explicit through isStable() + budget
Benchmark Results (GitHub Search):
| Metric | v0.2 (Flat) | v0.3 (Hierarchical) | Change |
|---|---|---|---|
| LLM Calls | 2-10 (unpredictable) | 2-3 (predictable) | ✅ Predictable |
| API Calls | 2 | 5 | |
| Repos Found | 3 | 10 | ✅ 3.3x better |
| Success Rate | 60% | 85% | ✅ +25% |
| Total Cost | 4-12 units | 5-8 units | ✅ Predictable |
| Duration | 15s | 25s |
Key Insights:
- Same LLM cost, better coverage through systematic exploration
- Predictable resource usage enables autonomous operation
- Deterministic inner loop enables reproducibility
In v0.3, these components moved to "optional meta-control":
- StrategySelector - Only needed for multi-domain scenarios
- FailureClassifier - Only needed for complex failure modes
- TerminationPolicy - Only needed for multi-objective optimization
Why: Most applications work fine with single ProbePolicy + Planner. Advanced scenarios can add these when needed.
v0.1 → v0.2: Added gradient information when single Ladder proved insufficient v0.2 → v0.3: Added hierarchy when flat architecture showed cost issues
Principle: Don't over-engineer upfront. Let real problems guide evolution.
What stayed constant:
- Gradient-guided exploration
- Modular separation of concerns
- Bounded sustainability
- Convergence through stability
What changed:
- Number of modules (4 → 7 → 8)
- Architecture (flat → hierarchical)
- Specific interfaces
Principle: Core philosophy is timeless. Implementation adapts to reality.
v0.1: Implicit resource limits → v0.2: Explicit BudgetTracker v0.2: Unclear LLM usage → v0.3: Explicit inner/outer loops
Principle: Make costs, boundaries, and responsibilities explicit.
v0.2 limitations discovered through GitHub search implementation v0.3 design validated through comparative benchmarks
Principle: Theory guides design, but practice reveals truth.
Date: 2026-02 Status: CURRENT
Architectural shift from framework-first to agent-first:
- New primary API —
cyberloop(agent, opts)wraps any agent with control - Middleware system — Composable
beforeStep/afterStephooks replace monolithic Orchestrator wiring - Agent protocol —
AgentLike(opaque) andSteppableAgent(step-level) interfaces - Built-in middleware —
budgetMiddleware,telemetryMiddleware,stagnationMiddleware,probeMiddleware,evaluatorMiddleware,policyMiddleware - Advanced middleware —
kinematicsMiddlewarewraps PhysicsEngine + PIDController from v2.1 - Backward compatible —
Orchestratorand all v2.1 components preserved
Architecture evolution:
v2.1 (Framework-first):
User Code
↓
Orchestrator (coordinates everything)
├─ ProbePolicy / KinematicProbePolicy
├─ Planner
├─ Probes, Evaluator, Ladder
└─ ControlBudget
v2.2 (Agent-first):
User Code
↓
cyberloop(agent, { middleware: [...] })
├─ MiddlewareRunner (beforeStep / afterStep)
│ ├─ budgetMiddleware (auto)
│ ├─ policyMiddleware (guards + reflexes + base policy)
│ ├─ kinematicsMiddleware (EKF/PID)
│ └─ telemetryMiddleware
└─ SteppableAgent.step() (user-defined)
Problem 1: High adoption barrier
// v2.1: User must learn 7+ interfaces to get started
const orchestrator = new Orchestrator({
env, evaluator, ladder, budget, selector, probes,
policies: [new KinematicProbePolicy(embedder, engine, pid)],
})
// Steep learning curve for simple use casesSolution: Progressive disclosure
// v2.2 Tier 1: Wrap any agent in one line
const controlled = cyberloop(myAgent, { budget: { maxSteps: 20 } })
// v2.2 Tier 2: Opt into step-level middleware when ready
const controlled = cyberloop(mySteppableAgent, {
middleware: [telemetryMiddleware(logger)],
})Problem 2: Monolithic control wiring
// v2.1: All control logic hardwired in Orchestrator.run()
// Adding a new concern (e.g., stagnation detection) requires
// modifying the Orchestrator or creating a new oneSolution: Composable middleware
// v2.2: Each concern is an independent middleware
const controlled = cyberloop(agent, {
middleware: [
stagnationMiddleware({ maxStagnantSteps: 5 }),
telemetryMiddleware(logger),
kinematicsMiddleware({ embedder, goalEmbedding, ... }),
],
})
// Add/remove/reorder without touching framework internalsProblem 3: Policy stack wiring exposed to users
// v2.1: User manually constructs ChainPolicy
const chain = new ChainPolicy(basePolicy, [guard1, guard2], [reflex1])
const action = await chain.decide(state, ladder)Solution: policyMiddleware
// v2.2: Declarative policy configuration
const { middleware, decideAction } = policyMiddleware({
basePolicy, guards: [guard1, guard2], reflexes: [reflex1], ladder,
})
// decideAction(state) inside step(), middleware handles lifecycle✅ Gradient-Guided: Middleware provides composable gradient sources (probes, evaluators, kinematics)
✅ Hierarchical: Three tiers (opaque → steppable → advanced) mirror inner/outer loop separation
✅ Modular: Middleware is the ultimate modular separation — each concern is a plug-in
✅ Bounded: budgetMiddleware auto-registered by default; hard limits always enforced
✅ Convergence: isDone() + budget halting provide explicit stopping criteria
- 337 tests across 28 files, all passing
- Zero breaking changes to existing Orchestrator API
- Three new standalone examples demonstrating progressive adoption
- Six revised examples (GitHub + Wikipedia) showing migration path
Candidates for addition:
- Beam search middleware - Parallel candidate exploration
- Query memoization middleware - Avoid redundant exploration
- Adaptive threshold middleware - Learn stability criteria dynamically
- Multi-objective evaluation - Balance multiple goals
- Outer loop middleware - Planner integration via middleware chain
What will NOT change:
- The five core pillars (see PHILOSOPHY.md)
- Hierarchical inner/outer architecture
- Explicit cost control
- Modular interfaces
- Backward compatibility with Orchestrator API
- Preserve core philosophy - Five pillars are immutable
- Validate through benchmarks - Theory must meet practice
- Document reasoning - Update this file with each version
- Maintain backward compatibility - When possible, provide migration paths
- Keep it simple - Add complexity only when justified by real problems
Ask yourself:
-
Does this preserve the five core pillars?
- Gradient-guided, hierarchical, modular, bounded, convergent
-
Does this solve a real problem?
- Not just theoretical elegance, but practical pain points
-
Is this the simplest solution?
- Can we achieve the goal with less complexity?
-
Can we benchmark the improvement?
- How will we measure success?
-
Does this maintain backward compatibility?
- If not, is the breaking change justified?
- Read PHILOSOPHY.md - Understand what must not change
- Read this document - Understand why things changed
- Read current AICL.md - Understand current state
- Check ADRs - See detailed decision records
AICL has evolved from a simple 4-module feedback loop to a sophisticated middleware-based SDK. Through each evolution:
✅ Core philosophy preserved - Five pillars remain constant ✅ Practical problems solved - Each change addressed real limitations ✅ Complexity justified - Added only when simpler approaches failed ✅ Benchmarks validated - Theory met practice successfully
The framework will continue to evolve, but always guided by the timeless principles in PHILOSOPHY.md.
"Evolution is not about changing what we are, but about becoming more fully what we always were."
Last Updated: 2026-02-14 Next Review: When v2.3 is proposed Maintained by: CyberLoop Project License: Apache-2.0