v4.0: Strip to commitment loop by JosephOIbrahim · Pull Request #1 · JosephOIbrahim/OTTO

JosephOIbrahim · 2026-02-10T20:11:31Z

Summary

Built OTTO v4.0 from scratch — 8 source files, 92 tests, one job: detect commitments from WhatsApp messages and follow up on them
Removed the entire v3 codebase — 255,798 lines of cognitive OS architecture (LIVRPS, NEXUS, PRISM, pheromone trails, encryption layer, MoE routing) that never shipped the one feature that mattered
Replaced CLAUDE.md — from 830-line v3 spec with borrowed USD composition semantics to 253-line v4 spec that maps 1:1 to actual code

What v4 does

MESSAGE IN --> DETECT --> STORE --> WAIT --> FOLLOW UP --> UPDATE
 (WhatsApp)  (Claude)  (SQLite)  (cron)   (template)   (count++)

Input: WhatsApp Cloud API webhooks via FastAPI
Detection: Claude Sonnet extracts commitments (confidence >= 0.7)
Storage: SQLite (~/.otto/commitments.db), no ORM
Follow-up: Template-based nudges, zero LLM cost, 24h cooldown
Interface: Click CLI — otto list, otto add, otto done, otto park, otto nudge, otto stats, otto watch, otto nuke

Commits

d7e56f2..fa4144e — v3 build history (kept for archaeology)
2444164 — He2025 attribution cleanup
0c2525d — v4.0 build (phases 0-5)
ca136a1 — CLAUDE.md rewrite
0cbc5ab — 9 fixes from codebase audit (71 → 92 tests)
542c565 — v3 removal (−255,798 lines)

Net change

649 files changed
  3,103 insertions(+)
241,731 deletions(-)

Test plan

python -m pytest otto_v4/tests/ -v -m "not integration" — 92 tests pass
otto list, otto add, otto stats — CLI works end-to-end
CI workflow (tests.yml) runs on Python 3.11-3.13, ubuntu + windows
Merge gate (Phase 6): Real WhatsApp message → real commitment in ~/.otto/commitments.db

Phase 6 is the merge gate. Don't merge until a real commitment from a real WhatsApp message lands in a real database.

🤖 Generated with Claude Code

Implements the immutable foundation for OTTO OS v3.0: - ConstitutionalPrinciples: 10 frozen principles (frozen dataclass) - SafetyFloors: protector=10%, decomposer=5%, restorer=5% (frozen) - validate() function to assert constitutional invariants at runtime - CLINICAL_BLOCKLIST tuple for user-facing string compliance - 30 passing tests covering immutability, values, validation, language Also scaffolds the v3 package structure (otto/) alongside existing src/otto/ (v0.7), bumps version to 3.0.0-dev, and adds CLAUDE.md project spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements Patent Claim #1 — layered cognitive property composition inspired by Pixar USD composition arcs: L(0) Learned → I(1) Inherited → V(2) Volatile → R(3) Reactive → P(4) Protective → S(5) Sovereign Core modules: - layers.py: LayerName IntEnum, Layer dataclass, LayerStack collection - properties.py: CognitiveProperty (frozen) with source_layer tracking - compositor.py: LIVRPSCompositor with resolve(), resolve_all(), resolve_with_audit(), layer activation/deactivation [He2025] compliance: - All iteration uses sorted() — no bare dict.items() - resolve_all() output sorted by property name - Descending priority traversal via IntEnum ordering - 100x determinism test confirms identical results 42 new tests, 72 total (+ Day 1), all passing in 0.25s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements Stage 1 (local, on-device) of the PRISM detection pipeline: Signals (19 total): - 8 primary cognitive states: FRUSTRATED, OVERWHELMED, DEPLETED, STUCK, EXPLORING, FOCUSED, HYPERFOCUS, CRASHED - 6 action signals: commitment tracking, meetings, tasks, decisions - 5 ambient signals: energy levels, context switches, crash zones Detection engine: - 28 regex patterns sorted by (signal_type.name, regex) for [He2025] - detect(text) returns signals sorted by (-confidence, signal_name) - detect_primary(text) returns highest-confidence signal or None - Deduplication: multiple patterns per signal type → keep best confidence - Deterministic tiebreaker: alphabetical signal name at equal confidence [He2025] compliance: - PATTERNS is a tuple (immutable), sorted at module load time - Pattern evaluation in fixed order - Output sorted with explicit tiebreaker - 100x determinism test across 7 sample texts, all identical 52 new tests, 124 total (+ Days 1-2), all passing in 0.31s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements the cognitive routing pipeline (Patent Claim #2) with 5 deterministic phases: Phase 1: ACTIVATE — signal-to-expert mapping via trigger sets Phase 2: WEIGHT — confidence * affinity + state boosts, clamped [0,1] Phase 3: BOUND — constitutional safety floors enforced (immutable) Phase 4: SELECT — primary + up to 2 supporting (>0.20 threshold) Phase 5: UPDATE — route callback stub (pheromone trails Day 7) 7 experts with signal affinities: Protector (floor 10%) — FRUSTRATED, OVERWHELMED, CRASHED Decomposer (floor 5%) — STUCK, OVERWHELMED, TASK_IMPLIED Restorer (floor 5%) — DEPLETED, LOW_ENERGY, CRASH_ZONE, CRASHED Redirector — CONTEXT_SWITCH Acknowledger — HIGH_ENERGY, DECISION_MADE, FOCUSED Guide — EXPLORING, DECISION_MADE, FOLLOW_UP_NEEDED Executor — FOCUSED, TASK_IMPLIED, HYPERFOCUS, COMMITMENTS State boosts from LIVRPS-resolved properties (energy, burnout, momentum) influence weighting without breaking determinism. [He2025] compliance: - ALL_EXPERTS tuple sorted by name at module load - STATE_BOOSTS tuple sorted by (property, value, expert) - All phase iterations use sorted order - ExpertSelection tiebreaker: (-value, expert_name) - 100 random inputs verify safety floors hold 51 new tests, 175 total (+ Days 1-3), all passing in 0.34s. 5 full-pipeline integration tests verify PRISM → NEXUS end-to-end: "UGH this is broken" → protector "I'm completely stuck" → decomposer "I'm exhausted" → restorer "ready to go" → executor "what if we tried" → guide Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… isolation Four-type cognitive memory (episodic, procedural, contextual, identity) backed by SQLite with WAL mode. Read-before-write invariant prevents blind overwrites of cognitive data. Identity memory constitutionally excluded from sync/export. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Key wrapping design: random master key encrypted by passphrase-derived wrapping key (Argon2id, memory-hard). Recovery key is the master key hex-encoded, verified against a stored verification blob. Master key never touches disk in plaintext. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Patent Claim #4: distributed learning through deposit/follow/decay. Kahan accumulator for numerically stable float aggregation (O(eps) vs O(n*eps) error bound). Named seed constants for [He2025] determinism. Half-life decay with incremental time reference and threshold pruning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements the full API layer connecting OTTO's cognitive architecture to the Anthropic Messages API: - OTTOClient: SDK wrapper with lazy import, dependency injection, and response normalization (frozen APIResponse) - EffortController: maps routing decisions to effort levels (protector/restorer → HIGH, agent team → HIGH, default → LOW) with cost estimation and gate thresholds - NEXUSPipeline: full detect → route → effort → prompt → call pipeline with dry_run support and expert voice system prompts - CompactionManager: Kahan-stable token tracking with threshold- based compaction triggering 78 new tests, 441 total passing. All tests use mock API clients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements 4 Tier-1 OS services following OTTOService protocol: - ClockService: time period, day type, time pressure (pure, no deps) - ProcessMonitor: app context, context switches, process load (psutil with injection fallback) - GitWatcher: commit velocity, uncommitted changes, stuck detection (subprocess git with injection fallback) - FileSystemWatcher: activity level, file churn (internal event tracking or snapshot injection) Plus: - CategoricalSignal: frozen privacy-safe data type (Patent Claim #3) - ServiceRegistry: lifecycle management + sorted signal collection - PlatformInfo: OS/WSL2/dependency detection All services enforce the privacy boundary: raw data (process names, file paths, commit messages) stays inside the service. Only categorical abstractions (coding/browsing, active/stalled, few/many) cross into downstream processing. 102 new tests, 543 total passing. All services tested with injected providers — no real psutil/watchdog/git calls in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Chat interface (ChatMessage, ConversationHistory, ChatSession), dashboard state visualization (CognitiveSummary, DashboardState), style constants, TUI skeleton, and MCP tool definitions with dispatch handler. All user-facing strings verified constitutional. 84 tests, 627 total passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Days 16-18 capstone: 55 integration tests covering full pipeline (PRISM→NEXUS→Effort→Prompt), ChatSession+Services flow, Memory+ Encryption roundtrip, Pheromone lifecycle, MCP end-to-end dispatch, constitution enforcement across 20 varied inputs. Performance benchmarks verify <10ms signal detection, <5ms routing, <20ms full pipeline. Automated audit checks: no bare dict.items(), no clinical language, no minimizing terms, safety floors immutable, privacy boundary enforced, encryption verified, determinism confirmed (100x repeated runs), conventional commits validated. Fixed "easy wins" minimizing term in restorer voice. 682 total tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ewrite Crash-recovery safety commit securing all in-progress v3-refactor work: - Systematic He2025 attribution: "[He2025] Compliant" -> "inspired by [He2025]" across 300+ source files, tests, and documentation - Remove old top-level otto/ package (84 files) — superseded by src/otto/ - Add otto_v3/ clean rewrite following CLAUDE.md Day 1-18 blueprint (core, api, services, mcp, ui modules) - Enhance interactive CLI with improved session continuity and LLM integration - Expand memory interface with richer query and retrieval capabilities - Add He2025 attribution cleanup/thinning utility scripts - Add .claude/ to .gitignore (local Claude Code config) 5,095 tests passing, 1 skipped. Zero failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…oss codebase The He2025 paper ("Defeating Nondeterminism in LLM Inference") addresses GPU kernel-level batch invariance. OTTO applies these *principles* at the application layer (sorted iteration, Kahan summation, fixed seeds), which is inspired by but distinct from the paper's kernel-level techniques. This commit: - Removes "ThinkingMachines" branding from 107 files (src, tests, docs, dashboard, configs, CI workflows, semgrep rules) - Renames check_he2025_compliance -> check_determinism_patterns (with backward-compat alias to avoid import breakage) - Changes HE2025_COMPLIANT -> HE2025_PRINCIPLES_APPLIED - Updates trail signals: he2025_compliant -> determinism_check_passed - Corrects overclaims: "is ThinkingMachines Determinism" -> "applies determinism principles inspired by [He2025]" - Keeps legitimate [He2025] citations as proper academic references - Only remaining refs: cleanup scripts (intentional) and 1 archived doc Tests: 5,095 passed, 1 skipped (unchanged) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Complete rewrite from cognitive OS to focused commitment tracker. WhatsApp watcher, Claude-powered detection, SQLite store, nudge system, Click CLI. 71 tests passing, 1,126 lines of source code. Phase 6 (real WhatsApp test) is the human gate before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

830 lines of cognitive architecture down to 253 lines describing what's actually built. Every section maps to real code in otto_v4/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Move raw SQL from cli.py to store.get_all() and store.avg_follow_ups_done() - Fix detector.py to parse deadline from Claude response JSON - Remove unused apscheduler dependency - Create conftest.py with shared store fixture, deduplicate test_nudge/test_store - Add 14 watcher tests (verification, message processing, signatures) - Add 7 tests for new store methods and detector deadline parsing - Fix Pydantic v2 deprecation warning in watcher.py - Delete broken Windows path dirs and nul artifact 92 tests passing (was 71). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Deleted 699 files: v3 source (282 .py), v3 tests (168 .py), docs, benchmarks, config, deploy, data, scripts, dashboard, MCP packages, 15 root markdown manifestos, 9.1 MB logo, broken CI workflows. Zero He2025 references remain. Zero ThinkingMachines references remain. What's left: 24 files — OTTO v4 commitment tracker, v4 CI, license. 92 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-10T20:11:44Z

Important

Review skipped

Too many files!

This PR contains 290 files, which is 140 over the limit of 150. Please upgrade to a paid plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Claude sometimes wraps JSON in ```json ... ``` fences even when told to respond with raw JSON. Strip the fences before parsing. Found during Phase 6 live test (93 tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- MAX_NUDGES = 3 (was 5) — aligns with CLAUDE.md interaction budget: "If OTTO sends more than 3 nudges in a day, something is wrong" - CLAUDE.md: soul section (user edit), stale numbers updated: - 6 test files, 93 tests (was 5 files, 71) - Phase 6 marked DONE (live test 2026-02-10) - APScheduler removed from deps list (already gone from pyproject.toml) - v3 reference removed from dev environment - test_nudge: template rotation test made robust (10 samples, not 3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Python's hash() is randomized per process (PYTHONHASHSEED). Two tests relied on hash distribution hitting different template indices, which failed on ubuntu-latest + Python 3.13 CI. Fixed: - test_different_counts_different_templates: 10 samples instead of 3 - test_overdue_templates_include_who_to: check template strings directly instead of hoping hash selects a template containing {who_to} Verified stable across 5 random PYTHONHASHSEED values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Joseph Ibrahim and others added 17 commits February 10, 2026 00:32

docs: replace v3 CLAUDE.md with v4.0 — commitment tracker spec

ca136a1

830 lines of cognitive architecture down to 253 lines describing what's actually built. Every section maps to real code in otto_v4/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Joseph Ibrahim and others added 3 commits February 10, 2026 15:26

JosephOIbrahim merged commit 7c42fd8 into master Feb 10, 2026
7 checks passed

JosephOIbrahim deleted the v4-reset branch February 10, 2026 20:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0: Strip to commitment loop#1

v4.0: Strip to commitment loop#1
JosephOIbrahim merged 20 commits into
masterfrom
v4-reset

JosephOIbrahim commented Feb 10, 2026

Uh oh!

coderabbitai Bot commented Feb 10, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JosephOIbrahim commented Feb 10, 2026

Summary

What v4 does

Commits

Net change

Test plan

Uh oh!

coderabbitai Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Feb 10, 2026 •

edited

Loading