Claude/implement p1 guardrails dc og i#266
Open
PetrAnto wants to merge 322 commits intocloudflare:mainfrom
Open
Claude/implement p1 guardrails dc og i#266PetrAnto wants to merge 322 commits intocloudflare:mainfrom
PetrAnto wants to merge 322 commits intocloudflare:mainfrom
Conversation
debug: add granular logging around response parsing
The DO was dying during response.text() - after receiving headers (200) but while streaming the response body from DeepSeek. Added 5-second heartbeat interval during body reading to: - Keep DO active during slow response streaming - Update lastUpdate to prevent watchdog false triggers - Log progress to diagnose slow responses https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: add heartbeat during response.text() to prevent DO death
The DO keeps dying during response.text() before any heartbeat fires. This suggests Cloudflare is hard-killing the DO, not just timing out. Changes: - Heartbeat every 2s instead of 5s during body reading - Add 30s timeout wrapper around response.text() - Checkpoint every 3 tools instead of 5 (less lost progress) If the timeout fires, we'll see an error. If DO still dies silently, the issue is Cloudflare terminating the process entirely. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: more aggressive heartbeat and timeout during response.text()
Root cause found: DeepSeek API sends HTTP 200 headers but then hangs during response body streaming. The 30s timeout catches this. Added: - Retry loop with up to 3 attempts for API calls - Automatic retry on response.text() timeout - 2 second delay between retries - Logging to track retry attempts This should make the bot much more resilient to DeepSeek's occasional response streaming hangs. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: add retry logic for DeepSeek API timeouts
- DeepSeek V3.2 is latest version with GPT-5 class reasoning - Routes through OpenRouter instead of direct API (more reliable) - Same cheap pricing: $0.25/$0.38 per 1M tokens - Avoids streaming hang issues seen with direct DeepSeek API https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
Fixes response.text() hang issue with DeepInfra-routed models (Qwen3 Coder, etc.) Changes: - Add chatCompletionStreamingWithTools() method to OpenRouterClient - Uses SSE streaming (stream: true) to read response incrementally - 30s idle timeout with AbortController for clean cancellation - Accumulates tool_call deltas by index - Returns same ChatCompletionResponse structure as non-streaming - stream_options.include_usage for token tracking - Update TaskProcessor to use streaming for OpenRouter provider - Non-OpenRouter providers keep existing fetch-based approach - Progress callback updates watchdog every 50 chunks - Retry logic preserved (3 attempts) Why streaming fixes the hang: - Non-streaming: response.text() waits for entire body, can hang indefinitely - Streaming: reads small chunks incrementally, detects stalls via idle timeout https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
Claude/review merge conflicts yv ug x
Without this, if fetch() hangs before returning a response, the idle timeout never starts and we wait for the 90s watchdog. Now: - 60s timeout on initial fetch (before streaming starts) - 30s idle timeout during streaming (resets on each chunk) - Better error messages: "connection timeout" vs "idle timeout" https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: add 60s timeout on initial fetch for streaming
Root cause (from Grok research): - Cloudflare Workers aggressively pool outbound connections - After many requests to same host, pooled connections become stale - Reusing stale connection causes fetch() to hang indefinitely - AbortController doesn't reliably interrupt stuck pooled connections Fix: - Add unique `_nc` query param to each request URL - This forces potentially new connections, bypassing stale pool - Tradeoff: ~100-300ms extra latency per call (new TLS handshake) - Benefit: Eliminates hangs entirely in most cases https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: add unique query param to bypass stale connection pooling
- Increased idle timeout from 30s to 45s per Grok's analysis - Added diagnostic info (model ID, content length) to timeout errors - Note: iteration 10 hang was likely caused by version rollout during test https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: increase streaming idle timeout to 45s for network resilience
AbortController only affects fetch(), not subsequent reader.read() calls. When the stream hangs mid-read, the abort signal doesn't interrupt it. Now each reader.read() is wrapped in Promise.race with a 45s timeout, ensuring mid-stream hangs are properly detected and trigger retries. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: use Promise.race timeout on reader.read() for mid-stream hangs
The "task stopped unexpectedly" message was misleading users by suggesting CPU issues. Updated to correctly indicate API timeouts or network issues, and prompt them to tap Resume. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
- Add autoResume flag to TaskState and TaskRequest - Implement auto-resume in alarm handler (up to 10 attempts) - Add /automode (or /auto) command to toggle the setting - Show auto-resume status in /status command - Update error message to mention API timeouts instead of CPU When enabled, tasks automatically resume on timeout instead of requiring manual "Resume" button tap. Useful for long-running tasks with intermittent API timeouts. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
Claude/review merge conflicts yv ug x
When resuming from checkpoint, the model would re-read rules and re-acknowledge the task instead of continuing implementation. This adds a [SYSTEM RESUME NOTICE] message to the conversation when loading a checkpoint, instructing the model to skip the acknowledgment phase and continue directly with implementation. Root cause: The skill prompt says "read rules and acknowledge", and the model follows that instruction on every resume. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: add resume instruction to break re-acknowledgment loop
Auto-resume was failing for direct provider models (DeepSeek, DashScope, Moonshot) because the API keys weren't stored in TaskState and weren't passed to the reconstructed TaskRequest. Now stores dashscopeKey, moonshotKey, deepseekKey in TaskState and passes them through during auto-resume. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: store direct API keys for auto-resume recovery Auto-resume was failing for direct provider models (DeepSeek, DashScope, Moonshot) because the API keys weren't stored in TaskState and weren't passed to the reconstructed TaskRequest. Now stores dashscopeKey, moonshotKey, deepseekKey in TaskState and passes them through during auto-resume. https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
- Replace invalid deepchimera (deepseek-r1t2-chimera) with deepfree (deepseek-r1:free) - Replace invalid mimo (xiaomi/mimo-v2) with nemofree (mistral-nemo:free) - Fix devstral to use mistralai/devstral-small:free (valid free model) - Fix grok to use x-ai/ prefix instead of xai/ - Fix grokcode to x-ai/grok-code-fast-1 - Fix flash to google/gemini-3-flash-preview - Fix geminipro to google/gemini-3-pro-preview - Fix mistrallarge to mistralai/mistral-large-2512 Added new models: - qwencoderfree: qwen/qwen3-coder:free (480B MoE free coding model) - llama70free: meta-llama/llama-3.3-70b-instruct:free - trinitymini: arcee-ai/trinity-mini:free (fast reasoning) - devstral2: mistralai/devstral-2512 (paid premium coding) https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
fix: update invalid OpenRouter model IDs - Replace invalid deepchimera (deepseek-r1t2-chimera) with deepfree (deepseek-r1:free) - Replace invalid mimo (xiaomi/mimo-v2) with nemofree (mistral-nemo:free) - Fix devstral to use mistralai/devstral-small:free (valid free model) - Fix grok to use x-ai/ prefix instead of xai/ - Fix grokcode to x-ai/grok-code-fast-1 - Fix flash to google/gemini-3-flash-preview - Fix geminipro to google/gemini-3-pro-preview - Fix mistrallarge to mistralai/mistral-large-2512 Added new models: - qwencoderfree: qwen/qwen3-coder:free (480B MoE free coding model) - llama70free: meta-llama/llama-3.3-70b-instruct:free - trinitymini: arcee-ai/trinity-mini:free (fast reasoning) - devstral2: mistralai/devstral-2512 (paid premium coding) https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
Deep analysis of how steipete's projects (mcporter, Peekaboo, CodexBar, oracle) and the current OpenRouter tool-calling model landscape can improve Moltworker. Identifies 7 architectural gaps (parallel execution, MCP integration, reasoning control, etc.) with 8 actionable recommendations prioritized by effort/impact. https://claude.ai/code/session_011qMKSadt2zPFgn2GdTTyxH
Add comprehensive tool-calling landscape and steipete ecosystem analysis
Checkpoints are now persistent: - Removed 1-hour expiry - saves persist until manually deleted - Checkpoints include task prompt for better display New save slot system for multiple projects: - /saves - List all saved checkpoints with details - /save [name] - Show checkpoint info - /saveas <name> - Backup current progress to named slot - /load <name> - Restore from a named slot - /delsave <name> - Delete a checkpoint Storage methods added: - listCheckpoints() - List all checkpoints for a user - getCheckpointInfo() - Get checkpoint metadata without full messages - deleteCheckpoint() - Delete a specific checkpoint - copyCheckpoint() - Copy between slots (for backup/restore) Also updated help message with new commands and fixed outdated model references (deepchimera/mimo → deepfree/qwencoderfree). https://claude.ai/code/session_01CoLZ1rPPP3Th81EGm55GAi
…DcOgI feat(acontext): Phase 2.3 Acontext observability integration
Add holiday banner to daily briefing using the Nager.Date public holidays API (100+ countries). Reverse geocodes user's coordinates to determine country code, queries Nager.Date for today's holidays, and displays a banner with holiday names (including local names) before the weather section. Non-blocking — gracefully skipped on any failure. - New fetchBriefingHolidays() with NagerHoliday type - Integrated into generateDailyBriefing parallel fetch - 9 new tests (689 total), typecheck clean AI: Claude Opus 4.6 (Session: 01SE5WrUuc6LWTmZC8WBXKY4) https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…DcOgI feat(tools): Phase 2.5.9 holiday awareness via Nager.Date API
Replace naive compressContext (keep N recent, drop rest) and estimateTokens (chars/4) with a smarter token-budgeted system that: - Assigns priority scores to messages (by role, recency, content type) - Maintains tool_call/result pairing for API compatibility - Summarizes evicted content (tool names, file paths, response snippets) - Greedy budget-filling from highest priority downward New module: src/durable-objects/context-budget.ts (pure functions) 28 new tests, 717 total passing. AI: Claude Opus 4.6 (Session: 018M5goT7Vhaymuo8AxXhUCg) https://claude.ai/code/session_018M5goT7Vhaymuo8AxXhUCg
The Acontext platform domain is acontext.io (by memodb-io), not acontext.com. Updates the default base URL in the client and the env type comment. https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…DcOgI fix(acontext): correct API base URL from acontext.com to acontext.io
…NF641 feat(task-processor): Phase 4.1 token-budgeted context retrieval
Audit and harden token-budgeted retrieval with safer tool pairing,\ntransitive keep-set closure, model-aware context budgets, and\nexpanded edge-case coverage plus audit documentation.\n\nAI: GPT-5.2-Codex (Session: codex-phase-4-1-audit-001)
…-budget-implementation fix(task-processor): Harden Phase 4.1 context-budget (safer tool pairing, model-aware budgets, estimator tweaks)
…check Cherry-pick best parts from Codex PR #121 on top of PR #120: - Rebalance priority scoring: tool results 40→55, plain assistant 20→18, add system role at 45 — tool evidence now survives over intermediate assistant reasoning during compression - Add final safety check to drop summary if it pushes result over budget - Update existing tests to tolerate summary being dropped on tight budgets - Add 4 new tests: summary drop, system priority, out-of-order tools All 731 tests pass, typecheck clean. https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…DcOgI fix(context-budget): improve priority scoring and add summary safety …
Prevent Cloudflare DO 30s CPU hard-kill by adding per-phase time budgets with checkpoint-save-before-crash behavior. - Add phase-budget.ts helper with budget constants (plan=8s, work=18s, review=3s) - Check elapsed time before each API call and tool execution - On budget exceeded: save checkpoint, increment autoResumeCount, let watchdog resume - Reset phase clock on phase transitions and checkpoint resume - Add PhaseBudgetExceededError with phase/elapsed/budget metadata - Add comprehensive unit tests for budget checks and constants https://claude.ai/code/session_01AtnWsZSprM6Gjr9vjTm1xp
…elist Replace Promise.all with Promise.allSettled for parallel tool execution so one failed tool doesn't cancel others. Add PARALLEL_SAFE_TOOLS whitelist to control which tools can run in parallel vs sequentially. - Add PARALLEL_SAFE_TOOLS set (11 read-only tools: fetch_url, browse_url, get_weather, get_crypto, github_read_file, github_list_files, fetch_news, convert_currency, geolocate_ip, url_metadata, generate_chart) - Mutation tools (github_api, github_create_pr, sandbox_exec) always sequential - Parallel path only when ALL tools are safe AND model has parallelCalls: true - Promise.allSettled maps rejected results to error messages with tool_call_id - Mixed safe+unsafe batches fall back to sequential execution - Add tests for isolation, sequential fallback, error propagation, whitelist https://claude.ai/code/session_01AtnWsZSprM6Gjr9vjTm1xp
…parallel-bAtHI Claude/budget circuit breakers parallel b at hi
Fix inconsistencies left by sprint session: - GLOBAL_ROADMAP: 12→14 tools (add github_create_pr, sandbox_exec) - GLOBAL_ROADMAP: Phase 1.1 clarify client.ts still uses Promise.all - GLOBAL_ROADMAP: Add Sprint 48h section with risk mitigation note - GLOBAL_ROADMAP: Fix dependency graph Phase 1 status - next_prompt: Add sprint tasks to recently completed - WORK_STATUS: Add S48.1/S48.2 tasks, update velocity (762 tests) - claude-log: Add sprint session entry with audit notes https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…th real BPE tokenizer Integrate gpt-tokenizer (cl100k_base encoding) for exact token counting in the context budget system. The heuristic chars/4 estimator is kept as a safe fallback if the tokenizer throws. - New: src/utils/tokenizer.ts — countTokens(), estimateTokensHeuristic() - Modified: context-budget.ts — estimateStringTokens delegates to real tokenizer - 18 new tokenizer tests, 772 total (all passing) - Bundle impact: +1.1 MB (cl100k_base BPE ranks), well within CF 10 MB limit https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
Best-of-5 Codex review: scored all candidate branches, extracted and fixed code from branch 4 (-8zikq4, 8/10). Adds backend route, API client types, AcontextSessionsSection component with status dots, age formatting, and responsive grid. 13 new tests (785 total). https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…DcOgI Claude/implement p1 guardrails dc og i
…dedup Consolidated best patterns from 4 parallel Codex implementations (PR130–133): - PR2's DRY `executeToolWithCache()` method (single entry point, no code duplication) - PR2's case-insensitive regex error detection (`/^error(?: executing)?/i`) - PR3's in-flight promise dedup cache (prevents duplicate API calls for identical parallel tool calls in the same batch) - PR3's explicit cache reset in `processTask()` (correct for DO instance reuse) - PR1's relative call-count test pattern (robust against mock accumulation) Cache only applies to PARALLEL_SAFE_TOOLS (read-only). Mutation tools (github_api, github_create_pr, sandbox_exec) always bypass cache. Error results are never cached to allow retries. 5 new tests (790 total), typecheck clean. https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…DcOgI feat(task-processor): Phase 4.3 — tool result caching with in-flight …
… quotes & personality Phase 4.4 — Cross-session context continuity: - Extended LastTaskSummary with resultSummary (first 500 chars of response) - Increased TTL from 1h to 24h for cross-task context - Added SessionSummary interface + ring buffer (20 entries per user in R2) - Added storeSessionSummary, loadSessionHistory, getRelevantSessions, formatSessionsForPrompt - Session context injected at all 3 system prompt sites (main, vision, orchestra) - 19 new tests for session storage, loading, relevance scoring, and formatting Phase 2.5.10 — Quotes & personality: - Added fetchRandomQuote (Quotable API) with fetchRandomAdvice (Advice Slip) fallback - Added fetchBriefingQuote exported function for testing - Quote section added to generateDailyBriefing via Promise.allSettled (zero latency impact) - Quote appears at end of briefing, silently skipped if both APIs fail - 7 new tests for quote fetching and briefing integration 820 tests pass (790 + 30 new), typecheck clean. https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
…DcOgI feat(learnings+tools): Phase 4.4 cross-session context + Phase 2.5.10…
20ea74f to
ae6a103
Compare
Implement Phase 5.5 web_search tool with Brave API execution, TTL cache,\nTaskProcessor/Telegram key plumbing, and test coverage updates.\n\nAI: GPT-5.2-Codex (Session: codex-phase-5-5-web-search-001)
Cherry-pick best of both Codex PRs: - PR 136: input validation (query.trim), Number.parseInt, error format with status code, braveSearchKey in non-DO toolContext - PR 137: tool ordering (web_search after fetch_news), vi.useFakeTimers for TTL test, briefing-aggregator test counts 15 tools https://claude.ai/code/session_01SE5WrUuc6LWTmZC8WBXKY4
ae6a103 to
457ce29
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.