Skip to content

v4.0: Strip to commitment loop#1

Merged
JosephOIbrahim merged 20 commits into
masterfrom
v4-reset
Feb 10, 2026
Merged

v4.0: Strip to commitment loop#1
JosephOIbrahim merged 20 commits into
masterfrom
v4-reset

Conversation

@JosephOIbrahim
Copy link
Copy Markdown
Owner

Summary

  • Built OTTO v4.0 from scratch — 8 source files, 92 tests, one job: detect commitments from WhatsApp messages and follow up on them
  • Removed the entire v3 codebase — 255,798 lines of cognitive OS architecture (LIVRPS, NEXUS, PRISM, pheromone trails, encryption layer, MoE routing) that never shipped the one feature that mattered
  • Replaced CLAUDE.md — from 830-line v3 spec with borrowed USD composition semantics to 253-line v4 spec that maps 1:1 to actual code

What v4 does

MESSAGE IN --> DETECT --> STORE --> WAIT --> FOLLOW UP --> UPDATE
 (WhatsApp)  (Claude)  (SQLite)  (cron)   (template)   (count++)
  • Input: WhatsApp Cloud API webhooks via FastAPI
  • Detection: Claude Sonnet extracts commitments (confidence >= 0.7)
  • Storage: SQLite (~/.otto/commitments.db), no ORM
  • Follow-up: Template-based nudges, zero LLM cost, 24h cooldown
  • Interface: Click CLI — otto list, otto add, otto done, otto park, otto nudge, otto stats, otto watch, otto nuke

Commits

  1. d7e56f2..fa4144e — v3 build history (kept for archaeology)
  2. 2444164 — He2025 attribution cleanup
  3. 0c2525dv4.0 build (phases 0-5)
  4. ca136a1 — CLAUDE.md rewrite
  5. 0cbc5ab — 9 fixes from codebase audit (71 → 92 tests)
  6. 542c565 — v3 removal (−255,798 lines)

Net change

649 files changed
  3,103 insertions(+)
241,731 deletions(-)

Test plan

  • python -m pytest otto_v4/tests/ -v -m "not integration" — 92 tests pass
  • otto list, otto add, otto stats — CLI works end-to-end
  • CI workflow (tests.yml) runs on Python 3.11-3.13, ubuntu + windows
  • Merge gate (Phase 6): Real WhatsApp message → real commitment in ~/.otto/commitments.db

Phase 6 is the merge gate. Don't merge until a real commitment from a real WhatsApp message lands in a real database.

🤖 Generated with Claude Code

Joseph Ibrahim and others added 17 commits February 10, 2026 00:32
Implements the immutable foundation for OTTO OS v3.0:

- ConstitutionalPrinciples: 10 frozen principles (frozen dataclass)
- SafetyFloors: protector=10%, decomposer=5%, restorer=5% (frozen)
- validate() function to assert constitutional invariants at runtime
- CLINICAL_BLOCKLIST tuple for user-facing string compliance
- 30 passing tests covering immutability, values, validation, language

Also scaffolds the v3 package structure (otto/) alongside existing
src/otto/ (v0.7), bumps version to 3.0.0-dev, and adds CLAUDE.md
project spec.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Patent Claim #1 — layered cognitive property composition
inspired by Pixar USD composition arcs:

  L(0) Learned → I(1) Inherited → V(2) Volatile →
  R(3) Reactive → P(4) Protective → S(5) Sovereign

Core modules:
- layers.py: LayerName IntEnum, Layer dataclass, LayerStack collection
- properties.py: CognitiveProperty (frozen) with source_layer tracking
- compositor.py: LIVRPSCompositor with resolve(), resolve_all(),
  resolve_with_audit(), layer activation/deactivation

[He2025] compliance:
- All iteration uses sorted() — no bare dict.items()
- resolve_all() output sorted by property name
- Descending priority traversal via IntEnum ordering
- 100x determinism test confirms identical results

42 new tests, 72 total (+ Day 1), all passing in 0.25s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Stage 1 (local, on-device) of the PRISM detection pipeline:

Signals (19 total):
- 8 primary cognitive states: FRUSTRATED, OVERWHELMED, DEPLETED,
  STUCK, EXPLORING, FOCUSED, HYPERFOCUS, CRASHED
- 6 action signals: commitment tracking, meetings, tasks, decisions
- 5 ambient signals: energy levels, context switches, crash zones

Detection engine:
- 28 regex patterns sorted by (signal_type.name, regex) for [He2025]
- detect(text) returns signals sorted by (-confidence, signal_name)
- detect_primary(text) returns highest-confidence signal or None
- Deduplication: multiple patterns per signal type → keep best confidence
- Deterministic tiebreaker: alphabetical signal name at equal confidence

[He2025] compliance:
- PATTERNS is a tuple (immutable), sorted at module load time
- Pattern evaluation in fixed order
- Output sorted with explicit tiebreaker
- 100x determinism test across 7 sample texts, all identical

52 new tests, 124 total (+ Days 1-2), all passing in 0.31s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the cognitive routing pipeline (Patent Claim #2) with
5 deterministic phases:

  Phase 1: ACTIVATE — signal-to-expert mapping via trigger sets
  Phase 2: WEIGHT   — confidence * affinity + state boosts, clamped [0,1]
  Phase 3: BOUND    — constitutional safety floors enforced (immutable)
  Phase 4: SELECT   — primary + up to 2 supporting (>0.20 threshold)
  Phase 5: UPDATE   — route callback stub (pheromone trails Day 7)

7 experts with signal affinities:
  Protector  (floor 10%) — FRUSTRATED, OVERWHELMED, CRASHED
  Decomposer (floor  5%) — STUCK, OVERWHELMED, TASK_IMPLIED
  Restorer   (floor  5%) — DEPLETED, LOW_ENERGY, CRASH_ZONE, CRASHED
  Redirector  — CONTEXT_SWITCH
  Acknowledger — HIGH_ENERGY, DECISION_MADE, FOCUSED
  Guide       — EXPLORING, DECISION_MADE, FOLLOW_UP_NEEDED
  Executor    — FOCUSED, TASK_IMPLIED, HYPERFOCUS, COMMITMENTS

State boosts from LIVRPS-resolved properties (energy, burnout,
momentum) influence weighting without breaking determinism.

[He2025] compliance:
  - ALL_EXPERTS tuple sorted by name at module load
  - STATE_BOOSTS tuple sorted by (property, value, expert)
  - All phase iterations use sorted order
  - ExpertSelection tiebreaker: (-value, expert_name)
  - 100 random inputs verify safety floors hold

51 new tests, 175 total (+ Days 1-3), all passing in 0.34s.

5 full-pipeline integration tests verify PRISM → NEXUS end-to-end:
  "UGH this is broken" → protector
  "I'm completely stuck" → decomposer
  "I'm exhausted" → restorer
  "ready to go" → executor
  "what if we tried" → guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… isolation

Four-type cognitive memory (episodic, procedural, contextual, identity) backed
by SQLite with WAL mode. Read-before-write invariant prevents blind overwrites
of cognitive data. Identity memory constitutionally excluded from sync/export.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Key wrapping design: random master key encrypted by passphrase-derived
wrapping key (Argon2id, memory-hard). Recovery key is the master key
hex-encoded, verified against a stored verification blob. Master key
never touches disk in plaintext.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Patent Claim #4: distributed learning through deposit/follow/decay.
Kahan accumulator for numerically stable float aggregation (O(eps) vs
O(n*eps) error bound). Named seed constants for [He2025] determinism.
Half-life decay with incremental time reference and threshold pruning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the full API layer connecting OTTO's cognitive architecture
to the Anthropic Messages API:

- OTTOClient: SDK wrapper with lazy import, dependency injection,
  and response normalization (frozen APIResponse)
- EffortController: maps routing decisions to effort levels
  (protector/restorer → HIGH, agent team → HIGH, default → LOW)
  with cost estimation and gate thresholds
- NEXUSPipeline: full detect → route → effort → prompt → call
  pipeline with dry_run support and expert voice system prompts
- CompactionManager: Kahan-stable token tracking with threshold-
  based compaction triggering

78 new tests, 441 total passing. All tests use mock API clients.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements 4 Tier-1 OS services following OTTOService protocol:

- ClockService: time period, day type, time pressure (pure, no deps)
- ProcessMonitor: app context, context switches, process load
  (psutil with injection fallback)
- GitWatcher: commit velocity, uncommitted changes, stuck detection
  (subprocess git with injection fallback)
- FileSystemWatcher: activity level, file churn (internal event
  tracking or snapshot injection)

Plus:
- CategoricalSignal: frozen privacy-safe data type (Patent Claim #3)
- ServiceRegistry: lifecycle management + sorted signal collection
- PlatformInfo: OS/WSL2/dependency detection

All services enforce the privacy boundary: raw data (process names,
file paths, commit messages) stays inside the service. Only
categorical abstractions (coding/browsing, active/stalled, few/many)
cross into downstream processing.

102 new tests, 543 total passing. All services tested with
injected providers — no real psutil/watchdog/git calls in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chat interface (ChatMessage, ConversationHistory, ChatSession),
dashboard state visualization (CognitiveSummary, DashboardState),
style constants, TUI skeleton, and MCP tool definitions with
dispatch handler. All user-facing strings verified constitutional.
84 tests, 627 total passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Days 16-18 capstone: 55 integration tests covering full pipeline
(PRISM→NEXUS→Effort→Prompt), ChatSession+Services flow, Memory+
Encryption roundtrip, Pheromone lifecycle, MCP end-to-end dispatch,
constitution enforcement across 20 varied inputs. Performance
benchmarks verify <10ms signal detection, <5ms routing, <20ms full
pipeline. Automated audit checks: no bare dict.items(), no clinical
language, no minimizing terms, safety floors immutable, privacy
boundary enforced, encryption verified, determinism confirmed (100x
repeated runs), conventional commits validated. Fixed "easy wins"
minimizing term in restorer voice. 682 total tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ewrite

Crash-recovery safety commit securing all in-progress v3-refactor work:

- Systematic He2025 attribution: "[He2025] Compliant" -> "inspired by [He2025]"
  across 300+ source files, tests, and documentation
- Remove old top-level otto/ package (84 files) — superseded by src/otto/
- Add otto_v3/ clean rewrite following CLAUDE.md Day 1-18 blueprint
  (core, api, services, mcp, ui modules)
- Enhance interactive CLI with improved session continuity and LLM integration
- Expand memory interface with richer query and retrieval capabilities
- Add He2025 attribution cleanup/thinning utility scripts
- Add .claude/ to .gitignore (local Claude Code config)

5,095 tests passing, 1 skipped. Zero failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oss codebase

The He2025 paper ("Defeating Nondeterminism in LLM Inference") addresses
GPU kernel-level batch invariance. OTTO applies these *principles* at the
application layer (sorted iteration, Kahan summation, fixed seeds), which
is inspired by but distinct from the paper's kernel-level techniques.

This commit:
- Removes "ThinkingMachines" branding from 107 files (src, tests, docs,
  dashboard, configs, CI workflows, semgrep rules)
- Renames check_he2025_compliance -> check_determinism_patterns (with
  backward-compat alias to avoid import breakage)
- Changes HE2025_COMPLIANT -> HE2025_PRINCIPLES_APPLIED
- Updates trail signals: he2025_compliant -> determinism_check_passed
- Corrects overclaims: "is ThinkingMachines Determinism" -> "applies
  determinism principles inspired by [He2025]"
- Keeps legitimate [He2025] citations as proper academic references
- Only remaining refs: cleanup scripts (intentional) and 1 archived doc

Tests: 5,095 passed, 1 skipped (unchanged)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete rewrite from cognitive OS to focused commitment tracker.
WhatsApp watcher, Claude-powered detection, SQLite store, nudge
system, Click CLI. 71 tests passing, 1,126 lines of source code.

Phase 6 (real WhatsApp test) is the human gate before merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
830 lines of cognitive architecture down to 253 lines describing
what's actually built. Every section maps to real code in otto_v4/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move raw SQL from cli.py to store.get_all() and store.avg_follow_ups_done()
- Fix detector.py to parse deadline from Claude response JSON
- Remove unused apscheduler dependency
- Create conftest.py with shared store fixture, deduplicate test_nudge/test_store
- Add 14 watcher tests (verification, message processing, signatures)
- Add 7 tests for new store methods and detector deadline parsing
- Fix Pydantic v2 deprecation warning in watcher.py
- Delete broken Windows path dirs and nul artifact

92 tests passing (was 71).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deleted 699 files: v3 source (282 .py), v3 tests (168 .py), docs,
benchmarks, config, deploy, data, scripts, dashboard, MCP packages,
15 root markdown manifestos, 9.1 MB logo, broken CI workflows.

Zero He2025 references remain. Zero ThinkingMachines references remain.

What's left: 24 files — OTTO v4 commitment tracker, v4 CI, license.
92 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 10, 2026

Important

Review skipped

Too many files!

This PR contains 290 files, which is 140 over the limit of 150. Please upgrade to a paid plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

Joseph Ibrahim and others added 3 commits February 10, 2026 15:26
Claude sometimes wraps JSON in ```json ... ``` fences even when told
to respond with raw JSON. Strip the fences before parsing. Found
during Phase 6 live test (93 tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- MAX_NUDGES = 3 (was 5) — aligns with CLAUDE.md interaction budget:
  "If OTTO sends more than 3 nudges in a day, something is wrong"
- CLAUDE.md: soul section (user edit), stale numbers updated:
  - 6 test files, 93 tests (was 5 files, 71)
  - Phase 6 marked DONE (live test 2026-02-10)
  - APScheduler removed from deps list (already gone from pyproject.toml)
  - v3 reference removed from dev environment
- test_nudge: template rotation test made robust (10 samples, not 3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python's hash() is randomized per process (PYTHONHASHSEED). Two tests
relied on hash distribution hitting different template indices, which
failed on ubuntu-latest + Python 3.13 CI. Fixed:

- test_different_counts_different_templates: 10 samples instead of 3
- test_overdue_templates_include_who_to: check template strings directly
  instead of hoping hash selects a template containing {who_to}

Verified stable across 5 random PYTHONHASHSEED values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@JosephOIbrahim JosephOIbrahim merged commit 7c42fd8 into master Feb 10, 2026
7 checks passed
@JosephOIbrahim JosephOIbrahim deleted the v4-reset branch February 10, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant