Open
Conversation
Worktree fix: Auto-clean stale worktrees before re-creating instead of crashing with "already exists". Handles engine crashes gracefully by running git worktree remove, prune, and fallback rm_rf. Seatbelt sandbox: Wire the existing sandbox.rs framework into actual agent execution. On macOS with backend="os-native", agents run under sandbox-exec with a restrictive seatbelt profile: - Write: only worktree, scratch dir, /tmp, cargo cache, .claude - Read: system paths, Rust toolchain, agent configs - Network: allowed (API access) - Per-task scratch dir created under worktrees/scratch/TASK-NNNN/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
create_branch_detached used force=false, so existing branches from previous runs kept their old commit pointer. Agents then worked on stale code (up to 5 commits behind main), causing all gate checks to fail. Now uses force=true to always update the branch to current HEAD, with a test proving the behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The restrictive file-read* subpath rules and limited IPC/mach/sysctl permissions caused sandbox-exec to SIGABRT (exit 134) on every agent invocation. Switched to: unrestricted reads (dyld/frameworks need unpredictable paths), write-restricted to worktree+scratch+tmp+caches, and broad process/ipc/mach/sysctl wildcards. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement harness-first engineering for acceptance criteria: each criterion now gets a verification tag (TEST, LINT, BENCH, MANUAL, BROWSER, SECURITY) specifying HOW it will be verified. This creates full traceability from requirement to verification result. Key changes: - New verification module with VerificationTag, TaggedCriterion, parsing, audit, enrichment, and gate result mapping - Pre-dispatch audit validates all criteria have tags before a task moves from Pending to Implementing; auto-enriches if needed - Gate 1 and Gate 2 results map back to specific tagged criteria - Dashboard shows per-criterion verification checklist with verified/failed/pending status icons - Planner agent prompt updated to require verification tags on all acceptance criteria Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch sandbox backend from "observe" to "os-native" for enforcement - Set network = true (seatbelt needs it for Anthropic API access) - Add agents/implementer_thrum.md with worktree containment instructions - Include agent-produced CI module, lifecycle tests, dashboard updates - Add CI config examples to minimal and pulseengine repos.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Git worktrees store metadata (HEAD, refs, index) in the main repo's .git/worktrees/<name>/ directory, not in the worktree itself. The seatbelt profile was only allowing writes to the worktree dir, so agents could write code but git commit silently failed. Now reads the .git file in the worktree to discover the gitdir path and adds it to the seatbelt allow-list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agents were writing code but never committing, causing "no changes" failures. Added step 8 (git add && git commit) to implementer_thrum.md and a CRITICAL reminder in the containment note appended to every implementation prompt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- default_branch() now checks refs directly as fallback when find_branch fails in worktree context (was returning "master") - branch_has_commits_beyond_main error handler now assumes changes exist (fail-safe) instead of discarding work - git worktree add uses --force to handle stale registrations - Bump budget ceiling to 2000 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
On each retry, the full task description (including all previous retry blocks) was wrapped with yet another retry block. After 10+ retries the prompt became so large agents timed out before writing code. Now extracts only the base description (before any retry blocks) and appends just the current retry context. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…it hooks
Three root causes fixed for "review agent says no changes":
1. diff_summary() compared main vs HEAD on the main repo (where HEAD=main),
giving zero diff. Added diff_summary_for_branch() to diff main vs task branch.
2. Claude CLI --output-format json returns a JSON array of events, not a single
object. Rewrote parse_claude_output to handle both formats.
3. Reviewer only received stats ("X files changed"), not actual code. Now sends
full unified diff patch in the review prompt.
Added pre-commit hook installation in worktrees (cargo fmt + clippy) so agents
get immediate feedback at commit time instead of wasting full gate cycles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create TraceRecords at each pipeline stage: - Requirement record when task enters Implementing (if requirement_id set) - Design record linking task description as design rationale - Implementation record with branch, commit SHA, and files changed - Test records at Gate 1 (Quality), Gate 2 (Proof), Gate 3 (Integration) - Proof records for Z3/Rocq formal verification checks in Gate 2 - Review record when reviewer agent reports - Add TraceStore.list_all() with optional task_id/requirement_id filters - Add TraceabilityMatrix.from_records() to build matrix from trace records - Add GitRepo.changed_files_on_branch() for implementation trace data - Add API endpoints: - GET /api/v1/traces/records - list trace records filtered by task/requirement - GET /api/v1/traces/matrix - build and return TraceabilityMatrix - GET /api/v1/traces/needs.json - export as sphinx-needs format - Add V-model visualization to dashboard: - New traceability section with HTMX polling - V-model chain (REQ→DESIGN→IMPL→TEST→PROOF→REVIEW) per requirement - Traceability matrix table showing status of each artifact type - CSS styles for vmodel-container, vmodel-step, vmodel-chain - Add comprehensive tests: - TraceabilityMatrix::from_records (grouping, failure override, CSV export) - TraceStore::list_all with filter combinations - API endpoint tests (records, matrix, needs.json) - Dashboard partial tests (empty state, with records)
Graceful shutdown (SIGTERM/SIGINT handler): - Handle both SIGINT (Ctrl+C) and SIGTERM via tokio signal handler - Track all spawned agent child process PIDs via ProcessTracker - On shutdown: SIGTERM all tracked PIDs, wait 30s, then SIGKILL survivors - Reset all claimed/implementing/integrating tasks back to pending - Clean up all worktrees created during this engine run - Check main repo working tree for unexpected modifications and warn - Clean up stale thrum-sysprompt temp files Startup recovery (beginning of run_parallel): - Kill orphaned claude -p processes (matched by thrum-sysprompt pattern) - Scan worktrees/ dir for orphaned worktrees and remove them - Reset stuck tasks in claimed/implementing/integrating to dispatchable - Check git status of all managed repos for uncommitted changes and warn - Clean up stale thrum-sysprompt-*.md temp files from dead processes - All recovery actions logged clearly for operator visibility New module: thrum-runner/src/shutdown.rs - ProcessTracker: Arc<Mutex<HashSet<u32>>> for tracking child PIDs - send_signal/is_process_alive: Unix signal helpers via libc - run_startup_recovery: orchestrates all startup checks - run_shutdown_cleanup: orchestrates all shutdown cleanup - Comprehensive tests for process tracker, orphan detection, etc. Wire-up changes: - subprocess.rs: new tracked variants register/unregister PIDs - claude.rs: ClaudeCliBackend carries optional ProcessTracker - backend.rs: build_registry_from_config_tracked passes tracker - parallel.rs: PipelineContext carries ProcessTracker - main.rs: cmd_run_parallel creates tracker and wires through
- Timeline steps now render as <a> links with title tooltips showing full stage name and description (e.g. "Gate 1: Quality: Automated quality checks: cargo fmt, clippy, and tests.") - Status badges have tooltips explaining current state and next step - Timeline step labels link to the relevant section on /dashboard/help - New /dashboard/help (and /dashboard/docs alias) route serving a self-contained pipeline reference page with: - ASCII state machine diagram - Detailed stage cards for all 9 pipeline stages - Retry logic and escalation strategy table - Budget model documentation - Status badge reference grid - Timeline key with color legend - Collapsible pipeline legend on the main dashboard (HTML <details>) showing the full P→I→G1→R→G2→A→Int→CI→M flow with color key - Help link (?) in dashboard header for quick access to docs - All documentation is self-contained in the server binary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…activity log - Add subtitle descriptions under each dashboard section header explaining its purpose (Task Queue, Agent Activity, Remote Sync, Memory, Pipeline Events) - Add hover tooltips on section headers with longer explanations - Rename "Activity Log" to "Pipeline Events" for clarity - Filter Activity Log (HTMX-polled traces) to only show pipeline-meaningful events: gate results, state transitions, errors, warnings, and events with pipeline-specific structured fields (task.id, gate.level, etc.) - Filter EngineLog SSE events client-side to exclude infrastructure noise (config loading, CLI invocations, subprocess spawning, etc.) - Add is_pipeline_event() and is_pipeline_log_message() to thrum-core telemetry with comprehensive test coverage - Add pipeline_only flag to TraceFilter for opt-in pipeline filtering - Style section descriptions with italic muted text and dotted underline on hoverable headers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add task_title field to AgentStarted event (with backward-compat serde default) - Show task title alongside task ID in agent card headers (live + dashboard) - Add live elapsed time counter that ticks every second for active agents - Auto-collapse finished/failed agent cards after 60s with CSS transition - Add clickable link from agent card header to task detail/review page - Track finished_at timestamp to distinguish active vs completed agents - Add tests: stage progression, task title capture, elapsed tracking, backward compat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The retry button's onclick+fetch mechanism worked server-side but had
two interacting issues making it appear broken:
1. Missing CSS for .action-result class - success/error messages were
inserted into the DOM but had no styling, making them nearly invisible
2. No .catch() handler on fetch() - any request failures were silently
swallowed with no user feedback
Fixes:
- Add .action-result CSS with styled banners (green success, red error)
and fade-in animation for clear visibility
- Add .catch() error handler and HTTP status checking to taskAction()
- Extract showActionResult() helper for consistent message display
- Pass button element to taskAction() for loading state feedback
(disabled + "…" text during request, prevents double-clicks)
- Guard getElementById calls with null checks for robustness
- Hide empty #task-action-result via CSS (:empty { display: none })
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…gent work Three key fixes to the change detection logic in parallel.rs: 1. has_commits_beyond_main() error now defaults to true (fail-safe) Previously defaulted to false, which combined with a clean worktree (dirty=false) resulted in has_changes=false, silently discarding committed agent work when git errored (e.g. index lock contention). 2. Added retry-with-delay for has_commits_beyond_main() matching the existing retry pattern for is_clean(). Transient index lock errors from concurrent agents get a second chance before falling back. 3. Added filesystem-level fallback via has_modified_source_files(). When git reports no changes, scans the worktree for recently modified source files as an independent safety net. Catches stale/corrupted git index cases that both git checks might miss. Includes 4 new tests for the filesystem fallback function covering: - Detection of recent source files - Ignoring .git/ and target/ directories - Nested source file detection - Graceful handling of nonexistent directories
- Enable axum's `ws` feature for WebSocket upgrade support - Add `ws.rs` module with WebSocket handler that: - Streams all EventBus PipelineEvents to clients as JSON - Accepts incoming JSON commands (ping, with ack for future commands) - Uses mpsc channel to bridge recv loop responses to send loop - Wire /ws route into api_router alongside existing SSE endpoint - Update dashboard.html JavaScript to connect via WebSocket first, falling back to SSE (/api/v1/events/stream) if WS is unavailable - Add exponential backoff reconnection for WebSocket connections - Add wsSendCommand() helper for sending commands from the browser - Comprehensive test coverage: unit tests for serialization/deserialization, integration test with real TCP WebSocket connection - All existing SSE, dashboard, and A2A tests continue to pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ption Previously, the sandbox granted write access to the entire .git/ common directory. This allowed agents to modify .git/config, which could set core.bare=true and break the repo for all subsequent operations. Now only grants write access to the specific subdirectories agents need: objects/, refs/, info/, logs/, and specific files (packed-refs, shallow, FETCH_HEAD). The .git/config file is no longer writable by sandboxed agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… set The traceability V-model view showed REQ as permanently empty because the Requirement trace was only emitted when task.requirement_id was explicitly set (which is never for auto-generated tasks). Now always emits REQ using the task title/description as the requirement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major efficiency bug: retry prompts included error context from unrelated tasks. query_errors_for_repo() returned ALL errors for the entire repo, so TASK-0023 would get errors from TASK-0036, TASK-0044, etc. Agents spent 20 minutes confused trying to fix nonexistent bugs and timed out. Added query_errors_for_task() that filters by task_id, ensuring agents only see their own previous failures on retry. Also fixed: CLI set-status --status pending now resets retry_count to 0. Previously retry counts accumulated across engine restarts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
create_branch() used force=false which fails if the branch already exists from a previous killed run. Changed to force=true to reset the branch to current HEAD, matching create_branch_detached() behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When using worktrees, the branch is already created by create_branch_detached and checked out by git worktree add. Calling create_branch again fails because git refuses to force-update a branch that is the current HEAD of a worktree. Also raises budget ceiling from $2000 to $3000. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Strip CLAUDE_CODE_ENTRYPOINT env var in all subprocess spawn paths to prevent Claude CLI nesting detection when engine runs inside a Claude Code session. Bump agent timeout from 1200s to 2400s to give agents enough time to complete pre-commit hooks. Add woven-thread SVG favicon using the Thrum thread color palette (amber, teal, violet, rose) and wire the serving route. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…red results - Add cargo-audit check: scans for known vulnerabilities in dependencies, fails gate if any advisory with CVSS >= 7.0 - Add cargo-deny check: enforces license policy, bans specific crates, detects duplicate dependencies (requires deny.toml in repo root) - Add cargo-mutants support (opt-in per repo): runs mutation testing on changed files only, reports mutation survival rate, warns if > 20% of mutations survive - Make gate checks configurable per-repo via `checks` field in repos.toml: `checks = ["cargo_fmt", "cargo_clippy", "cargo_test", "cargo_audit", "cargo_deny", "cargo_mutants"]` — defaults to the original three - Add per-check timing (duration_secs) to CheckResult for identifying slow checks and optimization opportunities - Add structured findings (CheckFinding) to CheckResult: each check reports machine-readable findings with category, severity, message, and optional numeric value for dashboard display and trend analysis - Add MutantsConfig for per-repo mutation testing options: changed_files_only, max_survival_rate, timeout_secs, extra_args - Update verification tag mapping to include new check types - Add CheckResult::simple() convenience constructor - Update all example configs with new check documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement a complete harness improvement feedback loop that enables
the system to learn from gate failures and human rejections:
- Human rejection prompts: when rejecting a task via CLI, `--gap-check`
captures what check would have caught the issue, stored as a HarnessGap
- Repeated failure analysis: `analyze_repeated_failures()` distinguishes
good catches from missing earlier checks based on occurrence patterns
- Auto-creation of harness improvement tasks from identified gaps
- Effectiveness metrics: detection rate, precision, false positive rate
per check, with aggregate harness-wide statistics
- Self-test capability: mutation testing framework that injects known
defects and verifies the harness catches them
- Full persistence layer via redb (HarnessStore) with gap and self-test
result tables
- CLI subcommands: `thrum harness {gaps,metrics,show,add-gap,resolve,
create-task,self-test,self-test-results}`
Also fixes pre-existing bug in sync.rs where git subprocess calls
leaked GIT_DIR/GIT_INDEX_FILE/GIT_WORK_TREE env vars during pre-commit
hook execution, causing rebase_branch tests to fail.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add invoke_streaming() to AiBackend trait with StreamingContext that carries the EventBus, AgentId, and TaskId needed for real-time output. ClaudeCliBackend and CliAgentBackend override this to use run_cmd_streaming() with a LineCallback that emits AgentOutput events line-by-line as subprocess output arrives. - Add StreamingContext struct to backend.rs - Add invoke_streaming() default method (delegates to invoke()) - Override in ClaudeCliBackend with run_cmd_streaming_tracked() - Override in CliAgentBackend with run_cmd_streaming() - Add sandbox_profile support to streaming subprocess functions - Switch parallel pipeline's implementation call to invoke_streaming() - Fix sync.rs git commands to clean env vars in worktree contexts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Planner produces Spec stored in task metadata, used for traceability
- Spec requirements feed into traceability chain with per-requirement IDs
- Proof obligations configure Gate 2 checks (prover + file existence)
- Implementer receives spec as Markdown context in prompt
- Gate checks verify implementation matches spec (affected files + proofs)
- Spec visible and editable on dashboard (TOML editor + markdown preview)
- CLI `thrum task spec <id>` for viewing/setting specs
- JSON API endpoints GET/POST /api/v1/tasks/{id}/spec
- Fix git env var leakage in sync.rs that broke rebase tests in worktrees
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add spec section rendering to review page (render_description_section)
- Add API endpoint tests: GET/POST /api/v1/tasks/{id}/spec roundtrip
- Add dashboard tests: spec section visibility, add-spec form, update-spec action
- Add planner JSON deserialization test
- Add traceability integration test
- Add run_spec_proof_checks integration test
- Add spec acceptance_criteria to tagged_criteria test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ution
Implement DAG-based task dependencies with cycle detection, file-level
conflict prediction, batch barriers, and post-merge compilation checks
to ensure safe parallel task dispatch.
New module: thrum-core/src/dependency.rs
- DependencyGraph with add/remove edges, cycle detection (DFS coloring),
topological sort (Kahn's algorithm), and eligibility checks
- TaskDependency, BatchBarrier, PostMergeCheck, PredictedConflict types
- predict_conflicts() uses spec.design.affected_files for overlap detection
- Comprehensive test suite (20+ unit tests)
Task model changes:
- depends_on: Vec<TaskDependency> field on Task
- batch_barrier: Option<BatchBarrier> field on Task
- has_dependencies() and dependencies_satisfied() helper methods
Event system:
- TaskBlocked, TaskUnblocked, DependencyCycleDetected
- BatchBarrierReached, PostMergeCheckCompleted, PredictedConflictDetected
Database:
- completed_task_ids() returns set of merged task IDs
- claim_next_with_deps() skips tasks with unsatisfied dependencies
Runner:
- emit_predicted_conflicts() warns about overlapping file lists
- run_post_merge_check() validates compilation between batches
- dispatch_batch() now uses dependency-aware task claiming
API & Dashboard:
- POST/GET /api/v1/tasks/{id}/dependencies endpoints
- GET /api/v1/dependencies/graph with full graph response
- Dashboard dependency partial with graph table, conflicts, barriers
- Task rows show dependency count and batch barrier badges
Also fix pre-existing bug: sync.rs git subprocesses now clear
GIT_DIR/GIT_INDEX_FILE/GIT_WORK_TREE env vars via clean_git_cmd()
helper, preventing test failures when run from within git hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add gate_report() and failing_check_names() helpers to TaskStatus - Add gate_history field to Task for preserving reports across retries - Extend API TaskResponse with gate_report, failing_checks, gate_history (serialized only when present via skip_serializing_if) - Add expandable gate failure report section to dashboard task detail partial, reusing existing render_single_gate_report() - Add collapsible gate history section showing previous attempt reports - Make task row tooltips dynamic: gate-failed rows show failing check names (e.g. "Gate 1 failed: cargo_fmt, cargo_clippy") - Push current gate report onto gate_history before retry transition in parallel pipeline runner - Fix pre-existing worktree GIT_DIR env leak in sync.rs by adding git_cmd() helper that clears GIT_DIR/GIT_WORK_TREE/GIT_INDEX_FILE - Add 10 new tests covering all new functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- RepoCooldownTracker: per-repo cooldown with exponential backoff (60s base, scales with consecutive failures, capped at 300s) - Rapid failure detection: if agent fails in <30s, inject 120s cooldown before retry to prevent tight failure loops - dispatch_batch() checks repo cooldown before dispatching tasks - run_agent_task() records success/failure for cooldown tracking - Fix git env var leakage (GIT_INDEX_FILE, GIT_DIR, GIT_WORK_TREE) in build.rs and sync.rs to prevent index corruption during hooks - 11 new tests covering cooldown tracker and rapid failure detection
Add per-role timeout recovery strategies (retry/skip/extend/fail) that replace the previous one-size-fits-all failure behavior. Review timeouts can now auto-approve with a "review-skipped-timeout" note instead of blocking the pipeline. Implementation timeouts check for partial work before declaring failure. Key changes: - TimeoutRecoveryStrategy enum in thrum-core with serde support - TimeoutRecovered event for observability of timeout recovery actions - handle_review_timeout helper in parallel.rs with Skip/Retry/Extend/Fail - Reviewer timeout handling in both run_task_pipeline and retry_task_pipeline - Implementation timeout memory persistence (error_type: "implementation_timeout") - Default strategies: implementer=Retry, reviewer=Skip, ci_fixer=Retry - Fix git env var leakage in sync.rs that caused flaky tests in worktrees Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ery config Three key changes: 1. Fix critical worktree path bug in salvage and change detection (parallel.rs lines ~1410 and ~1478). When running inside a worktree, repo_config.path was already set to the worktree path via with_work_dir(), but the code appended "worktrees/<branch>" creating a nested path that didn't exist. This caused both salvage_uncommitted() and change detection to silently fail, making timed-out agents appear to have "no changes" even when they committed real work before the timeout. 2. Add timeout_recovery config to pipeline.toml and all example configs. The TimeoutRecoveryStrategy enum and AgentRole field already existed in code, but no config files specified the value — causing deserialized roles to get the serde default (Fail) instead of role-appropriate defaults (implementer=retry, reviewer=skip). 3. Emit TimeoutRecovered event when agent times out but committed changes exist (continued-with-partial-changes). Previously only the salvage path emitted this event; committed-before-timeout was silent. Also adds tests for pipeline.toml format deserialization of timeout_recovery, default behavior when omitted, and new recovery action event variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire trust assessment into the pipeline, dashboard, and approval flow: - New trust.rs module with TrustConfig, RiskLevel enum (AutoOk < Standard < SecuritySensitive < HighRisk), TrustAssessment, and glob-matching engine - RepoConfig gains optional trust: Option<TrustConfig> from [repo.trust] - CheckpointSummary gains optional trust_assessment field - Pipeline computes trust assessment using changed_files_on_branch() at both approval checkpoint sites - High-risk files log warnings and set requires_human_review flag - Security-sensitive files trigger cargo-audit/cargo-deny checks - Dashboard review page renders color-coded trust section with per-file risk table (red=high, orange=security, blue=standard, green=ok) - Task rows and detail view show trust risk badges - approve_action blocks high-risk tasks unless force=true - bulk_approve skips high-risk tasks with explanatory message - Planner agent prompt references trust boundaries for risk assessment - Example configs include [repo.trust] sections - Integration tests for DB roundtrip, approval blocking, config parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace all hx-trigger="every Ns" polling in the dashboard with event-driven custom triggers (refreshBudget, refreshStatus, refreshTasks, refreshMemory, refreshTraceability, refreshActivity). Add form protection layer to prevent data loss during DOM morphing, surgical JSON update endpoints (/dashboard/api/status, /dashboard/api/budget), debounced refresh timers, and dismissible action banners. - Add BudgetUpdated, MemoryUpdated, TaskDataChanged event variants - Emit events from dashboard action handlers for reactive updates - Add clean_git helper in sync.rs to isolate subprocess git context - Add comprehensive test coverage for reactive dashboard behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The filesystem fallback (has_modified_source_files) was running unconditionally when git reported no changes. Since worktree checkout sets all file mtimes to "now", the 24h mtime check always triggered a false positive, letting empty branches sail through the entire pipeline (gate1 passes trivially on unchanged code, review is informational-only). Now the filesystem fallback only runs when git operations actually errored (fail-safe path). When both is_clean() and has_commits_beyond_main() succeed and report no changes, we trust the git result and correctly fail the task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve compilation errors and test failures introduced during the rebase onto origin/main: - Add missing CheckResult fields (duration_secs, findings) in API tests - Add missing trust_assessment field to CheckpointSummary initializers - Remove duplicate spec_api test body misplaced as task_actions_emit_state_change_events - Fix broken Request builder chain with orphaned .uri()/.body() calls - Split misplaced spec tests out of cooldown_tests module into spec_tests - Add missing RepoConfig fields (checks, mutants, trust) in runner tests - Remove orphaned doc comment and duplicate git_cmd function in sync.rs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Plan to reduce Thrum from ~40K lines to ~6K by delegating worktree management, sandboxing, session handling, and agent prompts to Claude Code 2.1.71+. Thrum retains its unique value: enforced quality gates, durable task queue, multi-repo integration, and human approval. Key insight: --output-format stream-json gives real-time visibility into agent tool calls, solving the dashboard observability problem. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace manual Default impls with #[default] attribute for DependencyKind, TimeoutRecoveryStrategy, and AuditLevel to satisfy clippy::derivable_impls on CI. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 1 of thin-thrum migration: introduces AgentConfig/AgentEvent/AgentResult types and spawn_agent() which invokes `claude -p` with --output-format stream-json. Bridge function invoke_streaming() returns legacy AiResponse for pipeline compat. parallel.rs now uses claude_code directly instead of AiBackend trait dispatch. Updates PLAN-THIN-THRUM.md with ultra-thin agent-teams variant (~3,300 LOC target). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 2 of thin-thrum: all agent invocations now go through claude_code::invoke_streaming() directly. Removes the entire BackendRegistry + AiBackend trait + 4 implementors: - anthropic.rs (Anthropic Messages API) - openai_compat.rs (OpenAI/Mistral/custom providers) - cli_agent.rs (generic CLI agent wrapper) - claude.rs (Claude CLI wrapper via subprocess) - backend.rs (trait definitions, registry, request/response types) AiResponse type moved to claude_code.rs. Migrated ci.rs dispatch_ci_fixer and main.rs invoke_planner to use claude_code directly. Removed unused async-openai and reqwest dependencies from thrum-runner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Steps 1-2 complete: claude_code.rs created, old backend stack deleted. Step 3 analysis shows most core modules are deeply integrated — mass deletion requires cascading changes. Updated line count table with current figures (39,202 LOC, down from 40,340). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code handles sandboxing via --permission-mode, so the sandbox_profile parameter (already unused) is removed from run_task_pipeline and retry_task_pipeline. Sandbox profile creation still happens in the dispatch function for seatbelt/observe modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… harness, watch (-6,667 lines) Phase 3 of thin-thrum migration. Removes 8 isolated feature modules that aren't core to the orchestration loop: - a2a.rs (core + api): Agent-to-agent protocol types and HTTP endpoints - safety.rs: ISO 26262 tool classification (AsilLevel moved to repo.rs) - sphinx_needs.rs: Requirements export format - consistency.rs: Cross-repo version consistency checks - harness.rs + harness_store.rs: Self-testing harness and persistence - watch.rs: TUI dashboard (replaced by web dashboard) All CLI subcommands (check, trace, safety, harness) and their handlers removed. Reject --gap-check flag removed. Release traceability export simplified. All 538 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These were only used by deleted modules (watch.rs TUI, consistency.rs cross-repo checks). Removes them from both crate and workspace Cargo.toml. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Key changes
Build fixes (across 9 files)
duration_secs/findingsfields toCheckResultinitializerstrust_assessmentfield toCheckpointSummaryinitializerstask_actions_emit_state_change_eventstest (was corrupted copy of spec_api roundtrip test).uri()/.body()callscooldown_testsmodule intospec_testschecks/mutants/trustfields toRepoConfigin runner testsgit_cmdfunction in sync.rsStrategic plan
--output-format stream-json --include-partial-messagesfor real-time visibilityTest plan
cargo fmt— cleancargo clippy -D warnings— cleancargo test --workspace— 638 tests passingcargo udeps— clean🤖 Generated with Claude Code