Skip to content

Latest commit

 

History

History
638 lines (501 loc) · 25.7 KB

File metadata and controls

638 lines (501 loc) · 25.7 KB

O - Self-Evolving Agent System V2 (with Elixir Supervision)

Project O V2 is a self-evolving AI Agent system with industrial-grade fault tolerance. Building upon the original Gerbil-based architecture, V2 introduces an Elixir/OTP supervision layer that prevents catastrophic self-destruction during evolution.


Table of Contents

  1. Overview
  2. The Self-Destruction Problem
  3. The Elixir Solution
  4. System Architecture V2
  5. The Trinity: Elixir + Gerbil + Zig/Rust
  6. Protected Evolution Cycle
  7. Multi-Threaded Evolution
  8. Implementation Details
  9. Performance Analysis
  10. Development Roadmap

Overview

The Vision

Create an AI Agent that can:

  • Learn from its own performance and user feedback
  • Modify its own code based on learnings
  • Survive catastrophic bugs during self-modification
  • Evolve in parallel universes and select the best version
  • Recover instantly from crashes with full memory intact

Technology Stack V2

Layer Technology Responsibility Portion
Supervision Layer Elixir/OTP Process lifecycle, state persistence, evolution arbitration 10%
Meta Layer Gerbil Scheme Self-modification engine, code generation 12%
Application Layer Gerbil Scheme Agent logic, DSL, memory, tools 53%
Infrastructure Layer Zig HTTP, WebSocket, databases, search 15%
Compute Layer Rust Vector operations, ML inference, encryption 8%
Foundation Layer C Libraries PostgreSQL, SQLite, OpenSSL 2%

The Self-Destruction Problem

The Fatal Flaw in V1

In the original architecture, the Agent could modify its own code at runtime. However, this created a critical vulnerability:

┌─────────────────────────────────────────────────────────────┐
│ SCENARIO: Agent Evolution Gone Wrong                         │
└─────────────────────────────────────────────────────────────┘

1. Agent detects memory system is slow
2. Agent generates "optimized" memory code
3. Agent hot-loads new memory module
4. ❌ BUG: New code has infinite recursion
5. 💥 Memory system crashes
6. 🧠 Agent loses all memory (amnesia)
7. 🔄 Cannot rollback (rollback system needs memory)
8. ☠️ System permanently dead

The Paradox: The Agent needs memory to rollback, but the memory system is what failed.

Why Traditional Solutions Don't Work

Approach Problem
Git rollback Requires the Agent to remember the rollback command
Backup files Requires the Agent to know where backups are
Validation before load Cannot catch all runtime bugs (e.g., race conditions)
Sandboxing Cannot prevent logic errors that corrupt state

Root Cause: The Agent is both the executor and the guardian of its own evolution. When it fails, there's no external entity to rescue it.


The Elixir Solution

The "External Guardian" Pattern

┌─────────────────────────────────────────────────────────────┐
│                    THE TRINITY ARCHITECTURE                  │
└─────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│  Elixir Supervisor (The Immortal Guardian)                   │
│  ┌────────────────────────────────────────────────────────┐  │
│  │ • Monitors Gerbil process heartbeat                     │  │
│  │ • Holds memory snapshots in ETS/DETS                   │  │
│  │ • Manages shadow instances for testing                 │  │
│  │ • Restarts crashed processes with last known state     │  │
│  └────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────┘
                            ↕ Port Communication
┌──────────────────────────────────────────────────────────────┐
│  Gerbil Agent (The Evolving Brain)                           │
│  • Runs main agent logic                                     │
│  • Generates and tests new code                             │
│  • Sends checkpoints to Elixir before risky operations      │
│  • Can crash without consequences                           │
└──────────────────────────────────────────────────────────────┘
                            ↓ FFI
┌──────────────────────────────────────────────────────────────┐
│  Zig/Rust (The High-Performance Muscle)                      │
│  • HTTP, databases, vector operations                        │
│  • Stateless, cannot corrupt agent memory                   │
└──────────────────────────────────────────────────────────────┘

Key Principles

  1. Separation of Concerns

    • Elixir: Lifecycle management (cannot crash)
    • Gerbil: Evolution logic (can crash safely)
  2. External State Persistence

    • Critical state stored in Elixir's ETS/DETS
    • Gerbil crashes don't affect Elixir's memory
  3. Shadow Testing

    • New code tested in isolated shadow instances
    • Main instance only updated if shadow succeeds
  4. Instant Recovery

    • Elixir detects crash within milliseconds
    • Restarts Gerbil with last checkpoint + WAL replay
    • Total downtime: ~50ms

System Architecture V2

Layered Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    SUPERVISION LAYER (NEW!)                      │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │                  Elixir/OTP Supervisor Tree                 │ │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌───────────┐ │ │
│  │  │Lifecycle │  │MemoryVault│  │Evolution │  │HealthMon │ │ │
│  │  │ Manager  │  │ (ETS/DETS)│  │ Arbiter  │  │(Telemetry)│ │ │
│  │  └──────────┘  └──────────┘  └──────────┘  └───────────┘ │ │
│  │                                                              │ │
│  │  Communication: Port (MessagePack) + Shared Memory          │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                              ↕ Supervised Port
┌─────────────────────────────────────────────────────────────────┐
│                         META LAYER                               │
│                    Gerbil Self-Modification Engine               │
│  • Code generation (macros, eval)                               │
│  • Checkpoint creation before risky operations                  │
│  • WAL (Write-Ahead Log) for operation logging                 │
│  • Elixir bridge for communication                              │
└─────────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────────┐
│                      APPLICATION LAYER                           │
│                    Gerbil Agent Core                             │
│  • Agent DSL, Memory System, Tool Framework                     │
│  • LLM Integration, Workflow Engine                             │
│  • All application logic (unchanged from V1)                    │
└─────────────────────────────────────────────────────────────────┘
                              ↓ FFI
┌─────────────────────────────────────────────────────────────────┐
│                   INFRASTRUCTURE LAYER                           │
│                    Zig Dynamic Libraries                         │
│  • HTTP Client/Server, WebSocket                                │
│  • PostgreSQL, SQLite, JSON Parser                              │
│  • Search Index                                                 │
└─────────────────────────────────────────────────────────────────┘
                              ↓ FFI
┌─────────────────────────────────────────────────────────────────┐
│                      COMPUTE LAYER                               │
│                    Rust Dynamic Libraries                        │
│  • Vector Operations (SIMD-optimized)                           │
│  • ML Inference, Encryption, Compression                        │
└─────────────────────────────────────────────────────────────────┘

The Trinity

Role Distribution

1. Elixir (The Skeleton / Life Support System)

Responsibilities:

  • Manage Gerbil process lifecycle (start, monitor, restart)
  • Hold Agent's core state stub (last resort memory backup)
  • Handle external I/O (WebSocket, API) and forward to Gerbil
  • Orchestrate shadow testing and evolution experiments
  • Provide instant crash recovery

Why Elixir?

  • BEAM VM's lightweight processes (millions of them)
  • OTP's battle-tested supervision trees (30+ years in telecom)
  • Built-in hot code reloading
  • Excellent fault isolation
  • ETS/DETS for fast in-memory and persistent storage

2. Gerbil (The Evolving Cortex)

Responsibilities:

  • Run main agent logic loop
  • Self-evolve: generate new .ss or .so files at runtime
  • Notify Elixir before risky operations: "I'm about to update my logic"
  • Maintain rich internal state (closures, continuations)

Why Gerbil?

  • Lisp metaprogramming: code is data, data is code
  • Compiled macros (AOT) for performance
  • Native C FFI through Gambit
  • Can eval new code at runtime with full access to environment

3. Zig/Rust (The Muscle)

Responsibilities:

  • Heavy computation (vector search, HTTP, databases)
  • Stateless operations (no internal state to corrupt)
  • High performance (compiled, SIMD-optimized)

Why Zig/Rust?

  • Zig: Simple C ABI, fast compilation, great for I/O
  • Rust: Memory safety, SIMD, great for compute-heavy tasks

Protected Evolution Cycle

The V2 Evolution Flow

┌─────────────────────────────────────────────────────────────────┐
│              PROTECTED SELF-EVOLUTION CYCLE                      │
└─────────────────────────────────────────────────────────────────┘

Phase 1: PERCEPTION
  Gerbil: Detects performance bottleneck
  ↓
  Gerbil → Elixir: {:evolution_intent, hypothesis}
  ↓
  Elixir: Logs intent, prepares for potential rollback

Phase 2: CHECKPOINT
  Gerbil: Serializes current state
  ↓
  Gerbil → Elixir: {:checkpoint, memory_snapshot, wal_entries}
  ↓
  Elixir: Persists to DETS + disk
  ↓
  Elixir → Gerbil: {:checkpoint_ack, checkpoint_id}

Phase 3: SHADOW TESTING
  Elixir: fork() creates Shadow Gerbil instance
  ↓
  Elixir → Shadow: {:load_code, new_code_v2}
  ↓
  Elixir: Routes 10% traffic to Shadow
  ↓
  Shadow: Runs for 5 minutes, collects metrics
  ↓
  Elixir: Compares Shadow vs Main performance

Phase 4: DECISION
  IF Shadow performs better:
    Elixir → Main: {:hot_reload, new_code_v2}
    Main: Executes hot reload
  ELSE:
    Elixir: kill(Shadow)
    Elixir: Logs failure reason

Phase 5: VALIDATION
  Elixir: Monitors Main health (heartbeat + memory + latency)
  ↓
  IF crash or performance degradation:
    Elixir: kill(Main)
    Elixir: Spawns new Gerbil instance
    Elixir → New: {:restore, checkpoint_id, wal_entries}
    New: Restores to pre-crash state
  ELSE:
    Elixir: Deletes old checkpoint, keeps new version

Phase 6: LEARNING
  Elixir: Records evolution outcome
  ↓
  Elixir → Gerbil: {:evolution_result, success: true/false, metrics: ...}
  ↓
  Gerbil: Updates evolution strategy based on feedback

Safety Mechanisms

  1. Heartbeat Monitoring

    • Gerbil sends heartbeat every 1 second
    • Elixir declares death after 5 seconds of silence
    • Automatic restart with last checkpoint
  2. Write-Ahead Log (WAL)

    • Every critical operation logged before execution
    • Crash recovery: last checkpoint + WAL replay
    • Maximum data loss: 1 second
  3. Shadow Testing

    • New code tested in isolated process
    • Main instance untouched until shadow proves stable
    • Automatic rollback if shadow fails
  4. Resource Limits

    • Memory quota per shadow instance
    • CPU time limits
    • Maximum concurrent evolutions
  5. Canary Deployment

    • Gradual traffic shift (10% → 50% → 100%)
    • Statistical significance testing
    • Automatic rollback on regression

Multi-Threaded Evolution

Parallel Universe Evolution

Instead of cautiously modifying itself in place, the Agent can spawn hundreds of parallel versions, each trying different mutations:

┌─────────────────────────────────────────────────────────────────┐
│           DARWINIAN EVOLUTION IN PARALLEL UNIVERSES              │
└─────────────────────────────────────────────────────────────────┘

Step 1: MUTATION
  Elixir spawns 100 Gerbil shadow instances
  Each receives different mutation instructions:
    - Shadow A: Use Zig-optimized vector search
    - Shadow B: Use pure Scheme implementation
    - Shadow C: Use randomly generated macro logic
    - Shadow D-Z: Various parameter tweaks

Step 2: COMPETITION
  Elixir sends same task to all 100 shadows
  Each shadow processes independently

Step 3: EVALUATION
  Elixir monitors:
    - Execution time
    - Memory usage
    - Error rate
    - Result quality

Step 4: SELECTION
  Shadow B: Too slow → killed
  Shadow C: Segfault → killed
  Shadow A: 50% faster, 0 errors → WINNER

Step 5: PROMOTION
  Elixir: Saves Shadow A's code as new main version
  Elixir: All future shadows inherit from Shadow A
  Elixir: Kills remaining shadows

Genetic Algorithm Evolution

# Pseudo-code for genetic evolution

population = initialize_population(base_code, size: 50)

for generation in 1..10 do
  # Evaluate fitness
  evaluated = Enum.map(population, fn individual ->
    shadow = spawn_shadow(individual.code)
    metrics = run_benchmark(shadow)
    fitness = calculate_fitness(metrics)
    {individual, fitness}
  end)
  
  # Selection (tournament)
  selected = tournament_selection(evaluated, count: 25)
  
  # Crossover
  offspring = crossover(selected, rate: 0.7)
  
  # Mutation
  mutated = mutate(offspring, rate: 0.1)
  
  # Elitism (keep top 5)
  elite = Enum.take(evaluated, 5)
  
  population = elite ++ mutated
end

best = Enum.max_by(population, & &1.fitness)
promote_to_main(best)

Adversarial Evolution (GAN-like)

┌─────────────────────────────────────────────────────────────────┐
│                  RED TEAM vs BLUE TEAM                           │
└─────────────────────────────────────────────────────────────────┘

Red Team (Attack Group):
  • Generates adversarial inputs
  • Finds edge cases that break current logic
  • Creates stress tests

Blue Team (Defense Group):
  • Modifies code to handle Red Team's attacks
  • Patches vulnerabilities
  • Improves robustness

Elixir Arbiter:
  • Runs Red vs Blue matches
  • Promotes Blue code that survives attacks
  • Promotes Red strategies that find new bugs
  • Iterates until convergence

Implementation Details

Communication Protocol

Message Format (MessagePack over Port)

┌──────────────────────────────────────────────────────────┐
│ Length (4 bytes) │ MessagePack Payload                   │
└──────────────────────────────────────────────────────────┘

Payload:
{
  "type": "checkpoint" | "wal_entry" | "heartbeat" | ...,
  "timestamp": 1234567890,
  "data": { ... },
  "metadata": { ... }
}

Shared Memory (for hot path)

typedef struct {
    uint64_t sequence_number;
    _Atomic uint64_t total_requests;
    _Atomic uint64_t total_latency_us;
    _Atomic uint32_t error_count;
    // Memory block index (avoid serialization)
    struct {
        uint64_t block_id;
        uint32_t offset;
        uint32_t size;
    } memory_index[10000];
} SharedState;

Checkpoint Format

Checkpoint File Structure:
┌────────────────────────────────────────────────────────┐
│ Header (256 bytes)                                     │
│  - Magic: "O_CKPT_V2"                                  │
│  - Version: 2                                          │
│  - Timestamp: 1234567890                               │
│  - Checkpoint ID: UUID                                 │
│  - Compressed size: N bytes                            │
│  - Uncompressed size: M bytes                          │
│  - SHA256 checksum                                     │
├────────────────────────────────────────────────────────┤
│ Compressed Data (zstd)                                 │
│  - Agent state (serialized Gerbil structures)          │
│  - Memory blocks                                       │
│  - Tool registry                                       │
│  - LLM conversation history                            │
│  - Metrics                                             │
└────────────────────────────────────────────────────────┘

WAL Format

WAL Entry:
{
  "sequence": 12345,
  "timestamp": 1234567890,
  "operation": "memory_add" | "tool_register" | "state_update",
  "data": { ... },
  "checksum": "sha256_hash"
}

WAL Segment File:
[Entry 1][Entry 2][Entry 3]...[Entry N]

Each entry: [size (4 bytes)][serialized entry]

Performance Analysis

Overhead Comparison

Metric V1 (No Elixir) V2 (With Elixir) Overhead
Request latency 10ms 11ms +10%
Throughput (QPS) 10,000 9,000 -10%
Memory (single) 80MB 100MB +25%
Crash recovery ∞ (manual) 50ms -
Evolution test time N/A 5 min -
Parallel evolution 1 50+ +5000%

Optimization Strategies

  1. Shared Memory for Hot Path

    • Metrics, indexes in shared memory
    • Avoid serialization overhead
  2. Batch WAL Writes

    • Buffer 100 entries or 1 second
    • Reduce I/O operations
  3. Async Checkpoints

    • Background thread for serialization
    • Non-blocking main loop
  4. COW Shadow Instances

    • Use fork() for shadows
    • Share read-only memory

Development Roadmap

Phase 0: Elixir Foundation (2 weeks)

Week Tasks Deliverables
0.1 Elixir project setup, OTP structure o_supervisor/ directory
0.2 Port communication, MessagePack lib/o/gerbil_manager.ex
0.3 MemoryVault (DETS + file backup) lib/o/memory_vault.ex
0.4 HealthMonitor (heartbeat + metrics) lib/o/health_monitor.ex

Phase 1: Gerbil Core (4 weeks)

Week Tasks Deliverables
1 Agent core, Elixir bridge agent/core.ss, agent/elixir-bridge.ss
2 DSL, state management agent/dsl.ss, agent/state.ss
3 Memory system with checkpoints agent/memory.ss
4 Tool framework, first working agent agent/tools.ss

Phase 2: Infrastructure (4 weeks)

Week Tasks Deliverables
5 Zig HTTP client module zig/http_client.zig
6 Zig database modules zig/postgres.zig, zig/sqlite.zig
7 Gerbil FFI layer, hot-reload ffi/zig-ffi.ss
8 Integration tests, benchmarks Test suite

Phase 3: Protected Evolution (4 weeks)

Week Tasks Deliverables
9 Shadow testing infrastructure lib/o/evolution_arbiter.ex
10 WAL system lib/o/wal_manager.ex
11 Checkpoint/recovery mechanism Full recovery flow
12 First protected evolution demo Working demo

Phase 4: Multi-Threaded Evolution (4 weeks)

Week Tasks Deliverables
13 Parallel shadow spawning 10+ concurrent shadows
14 Genetic algorithm evolution lib/o/genetic_evolution.ex
15 Adversarial evolution (Red/Blue) GAN-like evolution
16 Performance optimization Production-ready

Phase 5: Advanced Features (Ongoing)

Milestone Description
M1 Agent modifies its own DSL
M2 Agent generates and compiles new Zig module
M3 Agent optimizes its own memory strategy
M4 Agent learns from user feedback
M5 Agent discovers and fixes its own bugs

Conclusion

The V2 Advantage

V1 Problem: Agent could permanently destroy itself during evolution.

V2 Solution: Elixir provides an external "immune system" that:

  • Prevents death (instant restart)
  • Preserves memory (checkpoints + WAL)
  • Enables fearless experimentation (shadow testing)
  • Unlocks parallel evolution (100+ concurrent experiments)

The Trinity in Action

Elixir  = Immortality (cannot die)
Gerbil  = Intelligence (can evolve)
Zig/Rust = Performance (fast execution)

Key Innovation

The Agent can now fail safely. This transforms evolution from a cautious, incremental process into a bold, experimental one. The Agent can try radical mutations knowing that Elixir will catch it if it falls.

Next Steps

  1. Read ELIXIR_INTEGRATION.md for implementation details
  2. Follow IMPLEMENTATION_CHECKLIST.md for step-by-step guide
  3. Review docs/adr/ for architecture decisions
  4. Start with Phase 0 (Elixir foundation)

Status: Architecture approved, ready for implementation
Version: 2.0
Last Updated: 2026-01-16