Project O V2 is a self-evolving AI Agent system with industrial-grade fault tolerance. Building upon the original Gerbil-based architecture, V2 introduces an Elixir/OTP supervision layer that prevents catastrophic self-destruction during evolution.
- Overview
- The Self-Destruction Problem
- The Elixir Solution
- System Architecture V2
- The Trinity: Elixir + Gerbil + Zig/Rust
- Protected Evolution Cycle
- Multi-Threaded Evolution
- Implementation Details
- Performance Analysis
- Development Roadmap
Create an AI Agent that can:
- Learn from its own performance and user feedback
- Modify its own code based on learnings
- Survive catastrophic bugs during self-modification
- Evolve in parallel universes and select the best version
- Recover instantly from crashes with full memory intact
| Layer | Technology | Responsibility | Portion |
|---|---|---|---|
| Supervision Layer | Elixir/OTP | Process lifecycle, state persistence, evolution arbitration | 10% |
| Meta Layer | Gerbil Scheme | Self-modification engine, code generation | 12% |
| Application Layer | Gerbil Scheme | Agent logic, DSL, memory, tools | 53% |
| Infrastructure Layer | Zig | HTTP, WebSocket, databases, search | 15% |
| Compute Layer | Rust | Vector operations, ML inference, encryption | 8% |
| Foundation Layer | C Libraries | PostgreSQL, SQLite, OpenSSL | 2% |
In the original architecture, the Agent could modify its own code at runtime. However, this created a critical vulnerability:
┌─────────────────────────────────────────────────────────────┐
│ SCENARIO: Agent Evolution Gone Wrong │
└─────────────────────────────────────────────────────────────┘
1. Agent detects memory system is slow
2. Agent generates "optimized" memory code
3. Agent hot-loads new memory module
4. ❌ BUG: New code has infinite recursion
5. 💥 Memory system crashes
6. 🧠 Agent loses all memory (amnesia)
7. 🔄 Cannot rollback (rollback system needs memory)
8. ☠️ System permanently dead
The Paradox: The Agent needs memory to rollback, but the memory system is what failed.
| Approach | Problem |
|---|---|
| Git rollback | Requires the Agent to remember the rollback command |
| Backup files | Requires the Agent to know where backups are |
| Validation before load | Cannot catch all runtime bugs (e.g., race conditions) |
| Sandboxing | Cannot prevent logic errors that corrupt state |
Root Cause: The Agent is both the executor and the guardian of its own evolution. When it fails, there's no external entity to rescue it.
┌─────────────────────────────────────────────────────────────┐
│ THE TRINITY ARCHITECTURE │
└─────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Elixir Supervisor (The Immortal Guardian) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ • Monitors Gerbil process heartbeat │ │
│ │ • Holds memory snapshots in ETS/DETS │ │
│ │ • Manages shadow instances for testing │ │
│ │ • Restarts crashed processes with last known state │ │
│ └────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
↕ Port Communication
┌──────────────────────────────────────────────────────────────┐
│ Gerbil Agent (The Evolving Brain) │
│ • Runs main agent logic │
│ • Generates and tests new code │
│ • Sends checkpoints to Elixir before risky operations │
│ • Can crash without consequences │
└──────────────────────────────────────────────────────────────┘
↓ FFI
┌──────────────────────────────────────────────────────────────┐
│ Zig/Rust (The High-Performance Muscle) │
│ • HTTP, databases, vector operations │
│ • Stateless, cannot corrupt agent memory │
└──────────────────────────────────────────────────────────────┘
-
Separation of Concerns
- Elixir: Lifecycle management (cannot crash)
- Gerbil: Evolution logic (can crash safely)
-
External State Persistence
- Critical state stored in Elixir's ETS/DETS
- Gerbil crashes don't affect Elixir's memory
-
Shadow Testing
- New code tested in isolated shadow instances
- Main instance only updated if shadow succeeds
-
Instant Recovery
- Elixir detects crash within milliseconds
- Restarts Gerbil with last checkpoint + WAL replay
- Total downtime: ~50ms
┌─────────────────────────────────────────────────────────────────┐
│ SUPERVISION LAYER (NEW!) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Elixir/OTP Supervisor Tree │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐ │ │
│ │ │Lifecycle │ │MemoryVault│ │Evolution │ │HealthMon │ │ │
│ │ │ Manager │ │ (ETS/DETS)│ │ Arbiter │ │(Telemetry)│ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └───────────┘ │ │
│ │ │ │
│ │ Communication: Port (MessagePack) + Shared Memory │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
↕ Supervised Port
┌─────────────────────────────────────────────────────────────────┐
│ META LAYER │
│ Gerbil Self-Modification Engine │
│ • Code generation (macros, eval) │
│ • Checkpoint creation before risky operations │
│ • WAL (Write-Ahead Log) for operation logging │
│ • Elixir bridge for communication │
└─────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ Gerbil Agent Core │
│ • Agent DSL, Memory System, Tool Framework │
│ • LLM Integration, Workflow Engine │
│ • All application logic (unchanged from V1) │
└─────────────────────────────────────────────────────────────────┘
↓ FFI
┌─────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
│ Zig Dynamic Libraries │
│ • HTTP Client/Server, WebSocket │
│ • PostgreSQL, SQLite, JSON Parser │
│ • Search Index │
└─────────────────────────────────────────────────────────────────┘
↓ FFI
┌─────────────────────────────────────────────────────────────────┐
│ COMPUTE LAYER │
│ Rust Dynamic Libraries │
│ • Vector Operations (SIMD-optimized) │
│ • ML Inference, Encryption, Compression │
└─────────────────────────────────────────────────────────────────┘
Responsibilities:
- Manage Gerbil process lifecycle (start, monitor, restart)
- Hold Agent's core state stub (last resort memory backup)
- Handle external I/O (WebSocket, API) and forward to Gerbil
- Orchestrate shadow testing and evolution experiments
- Provide instant crash recovery
Why Elixir?
- BEAM VM's lightweight processes (millions of them)
- OTP's battle-tested supervision trees (30+ years in telecom)
- Built-in hot code reloading
- Excellent fault isolation
- ETS/DETS for fast in-memory and persistent storage
Responsibilities:
- Run main agent logic loop
- Self-evolve: generate new
.ssor.sofiles at runtime - Notify Elixir before risky operations: "I'm about to update my logic"
- Maintain rich internal state (closures, continuations)
Why Gerbil?
- Lisp metaprogramming: code is data, data is code
- Compiled macros (AOT) for performance
- Native C FFI through Gambit
- Can
evalnew code at runtime with full access to environment
Responsibilities:
- Heavy computation (vector search, HTTP, databases)
- Stateless operations (no internal state to corrupt)
- High performance (compiled, SIMD-optimized)
Why Zig/Rust?
- Zig: Simple C ABI, fast compilation, great for I/O
- Rust: Memory safety, SIMD, great for compute-heavy tasks
┌─────────────────────────────────────────────────────────────────┐
│ PROTECTED SELF-EVOLUTION CYCLE │
└─────────────────────────────────────────────────────────────────┘
Phase 1: PERCEPTION
Gerbil: Detects performance bottleneck
↓
Gerbil → Elixir: {:evolution_intent, hypothesis}
↓
Elixir: Logs intent, prepares for potential rollback
Phase 2: CHECKPOINT
Gerbil: Serializes current state
↓
Gerbil → Elixir: {:checkpoint, memory_snapshot, wal_entries}
↓
Elixir: Persists to DETS + disk
↓
Elixir → Gerbil: {:checkpoint_ack, checkpoint_id}
Phase 3: SHADOW TESTING
Elixir: fork() creates Shadow Gerbil instance
↓
Elixir → Shadow: {:load_code, new_code_v2}
↓
Elixir: Routes 10% traffic to Shadow
↓
Shadow: Runs for 5 minutes, collects metrics
↓
Elixir: Compares Shadow vs Main performance
Phase 4: DECISION
IF Shadow performs better:
Elixir → Main: {:hot_reload, new_code_v2}
Main: Executes hot reload
ELSE:
Elixir: kill(Shadow)
Elixir: Logs failure reason
Phase 5: VALIDATION
Elixir: Monitors Main health (heartbeat + memory + latency)
↓
IF crash or performance degradation:
Elixir: kill(Main)
Elixir: Spawns new Gerbil instance
Elixir → New: {:restore, checkpoint_id, wal_entries}
New: Restores to pre-crash state
ELSE:
Elixir: Deletes old checkpoint, keeps new version
Phase 6: LEARNING
Elixir: Records evolution outcome
↓
Elixir → Gerbil: {:evolution_result, success: true/false, metrics: ...}
↓
Gerbil: Updates evolution strategy based on feedback
-
Heartbeat Monitoring
- Gerbil sends heartbeat every 1 second
- Elixir declares death after 5 seconds of silence
- Automatic restart with last checkpoint
-
Write-Ahead Log (WAL)
- Every critical operation logged before execution
- Crash recovery: last checkpoint + WAL replay
- Maximum data loss: 1 second
-
Shadow Testing
- New code tested in isolated process
- Main instance untouched until shadow proves stable
- Automatic rollback if shadow fails
-
Resource Limits
- Memory quota per shadow instance
- CPU time limits
- Maximum concurrent evolutions
-
Canary Deployment
- Gradual traffic shift (10% → 50% → 100%)
- Statistical significance testing
- Automatic rollback on regression
Instead of cautiously modifying itself in place, the Agent can spawn hundreds of parallel versions, each trying different mutations:
┌─────────────────────────────────────────────────────────────────┐
│ DARWINIAN EVOLUTION IN PARALLEL UNIVERSES │
└─────────────────────────────────────────────────────────────────┘
Step 1: MUTATION
Elixir spawns 100 Gerbil shadow instances
Each receives different mutation instructions:
- Shadow A: Use Zig-optimized vector search
- Shadow B: Use pure Scheme implementation
- Shadow C: Use randomly generated macro logic
- Shadow D-Z: Various parameter tweaks
Step 2: COMPETITION
Elixir sends same task to all 100 shadows
Each shadow processes independently
Step 3: EVALUATION
Elixir monitors:
- Execution time
- Memory usage
- Error rate
- Result quality
Step 4: SELECTION
Shadow B: Too slow → killed
Shadow C: Segfault → killed
Shadow A: 50% faster, 0 errors → WINNER
Step 5: PROMOTION
Elixir: Saves Shadow A's code as new main version
Elixir: All future shadows inherit from Shadow A
Elixir: Kills remaining shadows
# Pseudo-code for genetic evolution
population = initialize_population(base_code, size: 50)
for generation in 1..10 do
# Evaluate fitness
evaluated = Enum.map(population, fn individual ->
shadow = spawn_shadow(individual.code)
metrics = run_benchmark(shadow)
fitness = calculate_fitness(metrics)
{individual, fitness}
end)
# Selection (tournament)
selected = tournament_selection(evaluated, count: 25)
# Crossover
offspring = crossover(selected, rate: 0.7)
# Mutation
mutated = mutate(offspring, rate: 0.1)
# Elitism (keep top 5)
elite = Enum.take(evaluated, 5)
population = elite ++ mutated
end
best = Enum.max_by(population, & &1.fitness)
promote_to_main(best)┌─────────────────────────────────────────────────────────────────┐
│ RED TEAM vs BLUE TEAM │
└─────────────────────────────────────────────────────────────────┘
Red Team (Attack Group):
• Generates adversarial inputs
• Finds edge cases that break current logic
• Creates stress tests
Blue Team (Defense Group):
• Modifies code to handle Red Team's attacks
• Patches vulnerabilities
• Improves robustness
Elixir Arbiter:
• Runs Red vs Blue matches
• Promotes Blue code that survives attacks
• Promotes Red strategies that find new bugs
• Iterates until convergence
Message Format (MessagePack over Port)
┌──────────────────────────────────────────────────────────┐
│ Length (4 bytes) │ MessagePack Payload │
└──────────────────────────────────────────────────────────┘
Payload:
{
"type": "checkpoint" | "wal_entry" | "heartbeat" | ...,
"timestamp": 1234567890,
"data": { ... },
"metadata": { ... }
}
Shared Memory (for hot path)
typedef struct {
uint64_t sequence_number;
_Atomic uint64_t total_requests;
_Atomic uint64_t total_latency_us;
_Atomic uint32_t error_count;
// Memory block index (avoid serialization)
struct {
uint64_t block_id;
uint32_t offset;
uint32_t size;
} memory_index[10000];
} SharedState;Checkpoint File Structure:
┌────────────────────────────────────────────────────────┐
│ Header (256 bytes) │
│ - Magic: "O_CKPT_V2" │
│ - Version: 2 │
│ - Timestamp: 1234567890 │
│ - Checkpoint ID: UUID │
│ - Compressed size: N bytes │
│ - Uncompressed size: M bytes │
│ - SHA256 checksum │
├────────────────────────────────────────────────────────┤
│ Compressed Data (zstd) │
│ - Agent state (serialized Gerbil structures) │
│ - Memory blocks │
│ - Tool registry │
│ - LLM conversation history │
│ - Metrics │
└────────────────────────────────────────────────────────┘
WAL Entry:
{
"sequence": 12345,
"timestamp": 1234567890,
"operation": "memory_add" | "tool_register" | "state_update",
"data": { ... },
"checksum": "sha256_hash"
}
WAL Segment File:
[Entry 1][Entry 2][Entry 3]...[Entry N]
Each entry: [size (4 bytes)][serialized entry]
| Metric | V1 (No Elixir) | V2 (With Elixir) | Overhead |
|---|---|---|---|
| Request latency | 10ms | 11ms | +10% |
| Throughput (QPS) | 10,000 | 9,000 | -10% |
| Memory (single) | 80MB | 100MB | +25% |
| Crash recovery | ∞ (manual) | 50ms | - |
| Evolution test time | N/A | 5 min | - |
| Parallel evolution | 1 | 50+ | +5000% |
-
Shared Memory for Hot Path
- Metrics, indexes in shared memory
- Avoid serialization overhead
-
Batch WAL Writes
- Buffer 100 entries or 1 second
- Reduce I/O operations
-
Async Checkpoints
- Background thread for serialization
- Non-blocking main loop
-
COW Shadow Instances
- Use
fork()for shadows - Share read-only memory
- Use
| Week | Tasks | Deliverables |
|---|---|---|
| 0.1 | Elixir project setup, OTP structure | o_supervisor/ directory |
| 0.2 | Port communication, MessagePack | lib/o/gerbil_manager.ex |
| 0.3 | MemoryVault (DETS + file backup) | lib/o/memory_vault.ex |
| 0.4 | HealthMonitor (heartbeat + metrics) | lib/o/health_monitor.ex |
| Week | Tasks | Deliverables |
|---|---|---|
| 1 | Agent core, Elixir bridge | agent/core.ss, agent/elixir-bridge.ss |
| 2 | DSL, state management | agent/dsl.ss, agent/state.ss |
| 3 | Memory system with checkpoints | agent/memory.ss |
| 4 | Tool framework, first working agent | agent/tools.ss |
| Week | Tasks | Deliverables |
|---|---|---|
| 5 | Zig HTTP client module | zig/http_client.zig |
| 6 | Zig database modules | zig/postgres.zig, zig/sqlite.zig |
| 7 | Gerbil FFI layer, hot-reload | ffi/zig-ffi.ss |
| 8 | Integration tests, benchmarks | Test suite |
| Week | Tasks | Deliverables |
|---|---|---|
| 9 | Shadow testing infrastructure | lib/o/evolution_arbiter.ex |
| 10 | WAL system | lib/o/wal_manager.ex |
| 11 | Checkpoint/recovery mechanism | Full recovery flow |
| 12 | First protected evolution demo | Working demo |
| Week | Tasks | Deliverables |
|---|---|---|
| 13 | Parallel shadow spawning | 10+ concurrent shadows |
| 14 | Genetic algorithm evolution | lib/o/genetic_evolution.ex |
| 15 | Adversarial evolution (Red/Blue) | GAN-like evolution |
| 16 | Performance optimization | Production-ready |
| Milestone | Description |
|---|---|
| M1 | Agent modifies its own DSL |
| M2 | Agent generates and compiles new Zig module |
| M3 | Agent optimizes its own memory strategy |
| M4 | Agent learns from user feedback |
| M5 | Agent discovers and fixes its own bugs |
V1 Problem: Agent could permanently destroy itself during evolution.
V2 Solution: Elixir provides an external "immune system" that:
- Prevents death (instant restart)
- Preserves memory (checkpoints + WAL)
- Enables fearless experimentation (shadow testing)
- Unlocks parallel evolution (100+ concurrent experiments)
Elixir = Immortality (cannot die)
Gerbil = Intelligence (can evolve)
Zig/Rust = Performance (fast execution)
The Agent can now fail safely. This transforms evolution from a cautious, incremental process into a bold, experimental one. The Agent can try radical mutations knowing that Elixir will catch it if it falls.
- Read
ELIXIR_INTEGRATION.mdfor implementation details - Follow
IMPLEMENTATION_CHECKLIST.mdfor step-by-step guide - Review
docs/adr/for architecture decisions - Start with Phase 0 (Elixir foundation)
Status: Architecture approved, ready for implementation
Version: 2.0
Last Updated: 2026-01-16