v4.0: Strip to commitment loop#1
Merged
Merged
Conversation
Implements the immutable foundation for OTTO OS v3.0: - ConstitutionalPrinciples: 10 frozen principles (frozen dataclass) - SafetyFloors: protector=10%, decomposer=5%, restorer=5% (frozen) - validate() function to assert constitutional invariants at runtime - CLINICAL_BLOCKLIST tuple for user-facing string compliance - 30 passing tests covering immutability, values, validation, language Also scaffolds the v3 package structure (otto/) alongside existing src/otto/ (v0.7), bumps version to 3.0.0-dev, and adds CLAUDE.md project spec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Patent Claim #1 — layered cognitive property composition inspired by Pixar USD composition arcs: L(0) Learned → I(1) Inherited → V(2) Volatile → R(3) Reactive → P(4) Protective → S(5) Sovereign Core modules: - layers.py: LayerName IntEnum, Layer dataclass, LayerStack collection - properties.py: CognitiveProperty (frozen) with source_layer tracking - compositor.py: LIVRPSCompositor with resolve(), resolve_all(), resolve_with_audit(), layer activation/deactivation [He2025] compliance: - All iteration uses sorted() — no bare dict.items() - resolve_all() output sorted by property name - Descending priority traversal via IntEnum ordering - 100x determinism test confirms identical results 42 new tests, 72 total (+ Day 1), all passing in 0.25s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements Stage 1 (local, on-device) of the PRISM detection pipeline: Signals (19 total): - 8 primary cognitive states: FRUSTRATED, OVERWHELMED, DEPLETED, STUCK, EXPLORING, FOCUSED, HYPERFOCUS, CRASHED - 6 action signals: commitment tracking, meetings, tasks, decisions - 5 ambient signals: energy levels, context switches, crash zones Detection engine: - 28 regex patterns sorted by (signal_type.name, regex) for [He2025] - detect(text) returns signals sorted by (-confidence, signal_name) - detect_primary(text) returns highest-confidence signal or None - Deduplication: multiple patterns per signal type → keep best confidence - Deterministic tiebreaker: alphabetical signal name at equal confidence [He2025] compliance: - PATTERNS is a tuple (immutable), sorted at module load time - Pattern evaluation in fixed order - Output sorted with explicit tiebreaker - 100x determinism test across 7 sample texts, all identical 52 new tests, 124 total (+ Days 1-2), all passing in 0.31s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the cognitive routing pipeline (Patent Claim #2) with 5 deterministic phases: Phase 1: ACTIVATE — signal-to-expert mapping via trigger sets Phase 2: WEIGHT — confidence * affinity + state boosts, clamped [0,1] Phase 3: BOUND — constitutional safety floors enforced (immutable) Phase 4: SELECT — primary + up to 2 supporting (>0.20 threshold) Phase 5: UPDATE — route callback stub (pheromone trails Day 7) 7 experts with signal affinities: Protector (floor 10%) — FRUSTRATED, OVERWHELMED, CRASHED Decomposer (floor 5%) — STUCK, OVERWHELMED, TASK_IMPLIED Restorer (floor 5%) — DEPLETED, LOW_ENERGY, CRASH_ZONE, CRASHED Redirector — CONTEXT_SWITCH Acknowledger — HIGH_ENERGY, DECISION_MADE, FOCUSED Guide — EXPLORING, DECISION_MADE, FOLLOW_UP_NEEDED Executor — FOCUSED, TASK_IMPLIED, HYPERFOCUS, COMMITMENTS State boosts from LIVRPS-resolved properties (energy, burnout, momentum) influence weighting without breaking determinism. [He2025] compliance: - ALL_EXPERTS tuple sorted by name at module load - STATE_BOOSTS tuple sorted by (property, value, expert) - All phase iterations use sorted order - ExpertSelection tiebreaker: (-value, expert_name) - 100 random inputs verify safety floors hold 51 new tests, 175 total (+ Days 1-3), all passing in 0.34s. 5 full-pipeline integration tests verify PRISM → NEXUS end-to-end: "UGH this is broken" → protector "I'm completely stuck" → decomposer "I'm exhausted" → restorer "ready to go" → executor "what if we tried" → guide Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… isolation Four-type cognitive memory (episodic, procedural, contextual, identity) backed by SQLite with WAL mode. Read-before-write invariant prevents blind overwrites of cognitive data. Identity memory constitutionally excluded from sync/export. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Key wrapping design: random master key encrypted by passphrase-derived wrapping key (Argon2id, memory-hard). Recovery key is the master key hex-encoded, verified against a stored verification blob. Master key never touches disk in plaintext. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Patent Claim #4: distributed learning through deposit/follow/decay. Kahan accumulator for numerically stable float aggregation (O(eps) vs O(n*eps) error bound). Named seed constants for [He2025] determinism. Half-life decay with incremental time reference and threshold pruning. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the full API layer connecting OTTO's cognitive architecture to the Anthropic Messages API: - OTTOClient: SDK wrapper with lazy import, dependency injection, and response normalization (frozen APIResponse) - EffortController: maps routing decisions to effort levels (protector/restorer → HIGH, agent team → HIGH, default → LOW) with cost estimation and gate thresholds - NEXUSPipeline: full detect → route → effort → prompt → call pipeline with dry_run support and expert voice system prompts - CompactionManager: Kahan-stable token tracking with threshold- based compaction triggering 78 new tests, 441 total passing. All tests use mock API clients. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements 4 Tier-1 OS services following OTTOService protocol: - ClockService: time period, day type, time pressure (pure, no deps) - ProcessMonitor: app context, context switches, process load (psutil with injection fallback) - GitWatcher: commit velocity, uncommitted changes, stuck detection (subprocess git with injection fallback) - FileSystemWatcher: activity level, file churn (internal event tracking or snapshot injection) Plus: - CategoricalSignal: frozen privacy-safe data type (Patent Claim #3) - ServiceRegistry: lifecycle management + sorted signal collection - PlatformInfo: OS/WSL2/dependency detection All services enforce the privacy boundary: raw data (process names, file paths, commit messages) stays inside the service. Only categorical abstractions (coding/browsing, active/stalled, few/many) cross into downstream processing. 102 new tests, 543 total passing. All services tested with injected providers — no real psutil/watchdog/git calls in tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chat interface (ChatMessage, ConversationHistory, ChatSession), dashboard state visualization (CognitiveSummary, DashboardState), style constants, TUI skeleton, and MCP tool definitions with dispatch handler. All user-facing strings verified constitutional. 84 tests, 627 total passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Days 16-18 capstone: 55 integration tests covering full pipeline (PRISM→NEXUS→Effort→Prompt), ChatSession+Services flow, Memory+ Encryption roundtrip, Pheromone lifecycle, MCP end-to-end dispatch, constitution enforcement across 20 varied inputs. Performance benchmarks verify <10ms signal detection, <5ms routing, <20ms full pipeline. Automated audit checks: no bare dict.items(), no clinical language, no minimizing terms, safety floors immutable, privacy boundary enforced, encryption verified, determinism confirmed (100x repeated runs), conventional commits validated. Fixed "easy wins" minimizing term in restorer voice. 682 total tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ewrite Crash-recovery safety commit securing all in-progress v3-refactor work: - Systematic He2025 attribution: "[He2025] Compliant" -> "inspired by [He2025]" across 300+ source files, tests, and documentation - Remove old top-level otto/ package (84 files) — superseded by src/otto/ - Add otto_v3/ clean rewrite following CLAUDE.md Day 1-18 blueprint (core, api, services, mcp, ui modules) - Enhance interactive CLI with improved session continuity and LLM integration - Expand memory interface with richer query and retrieval capabilities - Add He2025 attribution cleanup/thinning utility scripts - Add .claude/ to .gitignore (local Claude Code config) 5,095 tests passing, 1 skipped. Zero failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…oss codebase
The He2025 paper ("Defeating Nondeterminism in LLM Inference") addresses
GPU kernel-level batch invariance. OTTO applies these *principles* at the
application layer (sorted iteration, Kahan summation, fixed seeds), which
is inspired by but distinct from the paper's kernel-level techniques.
This commit:
- Removes "ThinkingMachines" branding from 107 files (src, tests, docs,
dashboard, configs, CI workflows, semgrep rules)
- Renames check_he2025_compliance -> check_determinism_patterns (with
backward-compat alias to avoid import breakage)
- Changes HE2025_COMPLIANT -> HE2025_PRINCIPLES_APPLIED
- Updates trail signals: he2025_compliant -> determinism_check_passed
- Corrects overclaims: "is ThinkingMachines Determinism" -> "applies
determinism principles inspired by [He2025]"
- Keeps legitimate [He2025] citations as proper academic references
- Only remaining refs: cleanup scripts (intentional) and 1 archived doc
Tests: 5,095 passed, 1 skipped (unchanged)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Complete rewrite from cognitive OS to focused commitment tracker. WhatsApp watcher, Claude-powered detection, SQLite store, nudge system, Click CLI. 71 tests passing, 1,126 lines of source code. Phase 6 (real WhatsApp test) is the human gate before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
830 lines of cognitive architecture down to 253 lines describing what's actually built. Every section maps to real code in otto_v4/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move raw SQL from cli.py to store.get_all() and store.avg_follow_ups_done() - Fix detector.py to parse deadline from Claude response JSON - Remove unused apscheduler dependency - Create conftest.py with shared store fixture, deduplicate test_nudge/test_store - Add 14 watcher tests (verification, message processing, signatures) - Add 7 tests for new store methods and detector deadline parsing - Fix Pydantic v2 deprecation warning in watcher.py - Delete broken Windows path dirs and nul artifact 92 tests passing (was 71). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deleted 699 files: v3 source (282 .py), v3 tests (168 .py), docs, benchmarks, config, deploy, data, scripts, dashboard, MCP packages, 15 root markdown manifestos, 9.1 MB logo, broken CI workflows. Zero He2025 references remain. Zero ThinkingMachines references remain. What's left: 24 files — OTTO v4 commitment tracker, v4 CI, license. 92 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Important Review skippedToo many files! This PR contains 290 files, which is 140 over the limit of 150. Please upgrade to a paid plan to get higher limits. You can disable this status message by setting the Use the checkbox below for a quick retry:
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. Comment |
Claude sometimes wraps JSON in ```json ... ``` fences even when told to respond with raw JSON. Strip the fences before parsing. Found during Phase 6 live test (93 tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- MAX_NUDGES = 3 (was 5) — aligns with CLAUDE.md interaction budget: "If OTTO sends more than 3 nudges in a day, something is wrong" - CLAUDE.md: soul section (user edit), stale numbers updated: - 6 test files, 93 tests (was 5 files, 71) - Phase 6 marked DONE (live test 2026-02-10) - APScheduler removed from deps list (already gone from pyproject.toml) - v3 reference removed from dev environment - test_nudge: template rotation test made robust (10 samples, not 3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python's hash() is randomized per process (PYTHONHASHSEED). Two tests
relied on hash distribution hitting different template indices, which
failed on ubuntu-latest + Python 3.13 CI. Fixed:
- test_different_counts_different_templates: 10 samples instead of 3
- test_overdue_templates_include_who_to: check template strings directly
instead of hoping hash selects a template containing {who_to}
Verified stable across 5 random PYTHONHASHSEED values.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What v4 does
~/.otto/commitments.db), no ORMotto list,otto add,otto done,otto park,otto nudge,otto stats,otto watch,otto nukeCommits
d7e56f2..fa4144e— v3 build history (kept for archaeology)2444164— He2025 attribution cleanup0c2525d— v4.0 build (phases 0-5)ca136a1— CLAUDE.md rewrite0cbc5ab— 9 fixes from codebase audit (71 → 92 tests)542c565— v3 removal (−255,798 lines)Net change
Test plan
python -m pytest otto_v4/tests/ -v -m "not integration"— 92 tests passotto list,otto add,otto stats— CLI works end-to-endtests.yml) runs on Python 3.11-3.13, ubuntu + windows~/.otto/commitments.db🤖 Generated with Claude Code