Skip to content

fix: SSE streaming reliability — heartbeat race, idle watchdog, fast-path, e2e tests#79

Merged
samueltuyizere merged 23 commits into
routatic:mainfrom
ddawnlll:main
Jun 19, 2026
Merged

fix: SSE streaming reliability — heartbeat race, idle watchdog, fast-path, e2e tests#79
samueltuyizere merged 23 commits into
routatic:mainfrom
ddawnlll:main

Conversation

@ddawnlll

Copy link
Copy Markdown
Contributor

Summary

Fixes three streaming SSE proxy reliability bugs and adds comprehensive streaming e2e tests. The primary symptom was API Error: InvalidHTTPResponse from the Anthropic SDK during large file writes (long-running streams where the heartbeat had time to race with SSE writes).

Changes

1. CRITICAL: ResponseWriter race condition (messages.go)

Root cause: The heartbeat goroutine (every 3s writing :keepalive\n\n + Flush) and the stream goroutine (writing SSE events + Flush) both wrote directly to the same http.ResponseWriter without synchronization. On long-running streams (e.g., large file writes), the heartbeat fires 20-40+ times, making the race almost certain.

Fix: Added sync.Mutex to the responseWriter wrapper. All Write, WriteHeader, and Flush calls are serialized. Both the heartbeat and stream path now flush through the mutex-protected wrapper.

Impact: Clean SSE output on concurrent writes. Zero data races confirmed via go test -race.

2. MEDIUM: Idle watchdog misclassification (stream.go + messages.go)

Root cause: After switching from http.ResponseController.SetReadDeadline to context-based idle detection, the idle watchdog calls cancel() which produces context.Canceled on Read(). But isIdleTimeoutErr() only caught context.DeadlineExceeded and net.Error.Timeout(). Idle timeouts were misclassified as generic failures and logged misleadingly as "stream proxy failed".

Fix: Added explicit check: errors.Is(err, context.Canceled) && clientCtx.Err() == nil — the upstream was canceled but the client is still connected, which can only come from the watchdog. Returns ErrStreamIdle correctly.

Affected paths: ProxyStream, ProxyResponsesStream, ProxyGeminiStream, and handleAnthropicStreaming (gained clientCtx parameter).

3. MINOR: SSE fast-path content truncation (stream.go)

Root cause: The fast-path content extractor used strings.Index(data[start:], '"') to find the end of a JSON string value. This stopped at escaped \" inside the content, silently dropping everything after it.

Fix: Replaced with a byte-by-byte walk that skips \ + the escaped character, correctly bypassing escaped quotes.

4. Refactor: idle watchdog module (idle.go, new file)

Extracted StartIdleWatchdog from stream.go into its own file. No behavior change.

5. e2e streaming tests (scripts/e2e-test.sh)

Added two new test functions: test_streaming_model (SSE with tools, verifies message_start + message_stop framing) and test_streaming_long (500-token output to exercise multiple heartbeat ticks). Five new test cases covering Go provider streaming, Zen streaming, Anthropic endpoint streaming, and long-stream heartbeat path.

Risk Analysis

Risk Severity Mitigation
Mutex contention on responseWriter Low SSE writes are small (individual events). Only the heartbeat (3s) competes — contention is negligible.
Recursive mutex lock None No responseWriter method calls another while holding the lock.
Context cancellation confusion Low clientCtx.Err() == nil check cleanly distinguishes watchdog from client disconnect. The upstream cancel() function is still called once, then becomes a no-op on subsequent calls.
Regression on fast path Low The fix only changes how the closing quote is found — the extracted content is the same for non-escaped content (the common case). Existing 22 stream tests cover this path.
e2e streaming test flakiness Low Tests verify SSE framing structure, not specific content. Network-level failures would also fail the non-streaming tests.

Verification

  • go build ./... — success
  • go test -race ./... — 302 passed in 12 packages, 0 races
  • bash -n scripts/e2e-test.sh — syntax OK
  • go vet ./... — no issues
  • Previous e2e run showed all non-streaming tests passing against live upstream

Known Limitations

  • The SSE fast path still does not handle JSON unicode escapes (\uXXXX) — these are rare in LLM output and would be handled by the JSON fallback path instead.
  • Long streams with extreme token counts (>100K) have not been tested in CI but the same code paths apply.

🤖 Generated with Claude Code

dawn_ll and others added 6 commits June 16, 2026 23:45
… spiral

Go provider models (minimax, qwen) now go through the OpenAI Chat Completions transform path instead of the raw Anthropic endpoint, preventing "Unknown server-tool shorthand" 400 errors from Claude Code's MCP tool format.

Changes:
- Add AnthropicToolsDisabled config flag for models that don't support Anthropic tool format
- Add sanitizeAnthropicBody() to strip tool type fields in raw Anthropic path (also protects Zen Claude models)
- Add 4xx fail-fast in fallback handler — non-retryable errors skip circuit breaker to avoid opening it for format mismatches
- Add temperature constraint for kimi-k2.7-code (only accepts 1.0)
- Remove Go provider models from IsAnthropicModel() — all Go models now use the Chat Completions transform path
- Comprehensive tests for sanitization, retryable errors, circuit breaker behavior, and temperature constraints

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… spiral

Go provider models (minimax, qwen) now go through the OpenAI Chat Completions transform path instead of the raw Anthropic endpoint, preventing "Unknown server-tool shorthand" 400 errors from Claude Code's MCP tool format.

Changes:
- Add AnthropicToolsDisabled config flag for models that don't support Anthropic tool format
- Add sanitizeAnthropicBody() to strip tool type fields in raw Anthropic path (also protects Zen Claude models)
- Add 4xx fail-fast in fallback handler — non-retryable errors skip circuit breaker to avoid opening it for format mismatches
- Add temperature constraint for kimi-k2.7-code (only accepts 1.0)
- Remove Go provider models from IsAnthropicModel() — all Go models now use the Chat Completions transform path
- Comprehensive tests for sanitization, retryable errors, circuit breaker behavior, and temperature constraints
- E2E test script validating all models with tool-format requests

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qwen3.7-max on OpenCode Go provider doesn't support the OpenAI Chat
Completions format ("oa-compat"). It must stay on the raw Anthropic
endpoint, but will be protected by the body sanitization added in the
previous commit.

Other qwen models (qwen3.5/3.6/3.7-plus) work fine through the
transform path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ne only

Removes http.Client.Timeout that was killing streams after 5 min regardless
of activity. Server WriteTimeout set to 0. Each upstream read uses a
per-Read deadline via http.ResponseController.SetReadDeadline that is
renewed on every successful byte. Only an idle gap exceeding
stream_timeout_ms (defaults to timeout_ms) treats the connection as stuck
and routes to the next fallback model.

Also demotes "client disconnected during stream" logs from Info to Debug —
this is normal during Claude Code tool execution, not a failure signal.

Adds StreamTimeoutMs config field to both OpenCodeGo and OpenCodeZen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…sclassification, fast-path edge case, and add e2e streaming tests

Fixes three bugs in the streaming SSE proxy path:

1. **CRITICAL — ResponseWriter race condition** (messages.go):
   The heartbeat goroutine (every 3s) and the stream goroutine wrote
   concurrently to the same http.ResponseWriter with no synchronization.
   Interleaved writes corrupted SSE frames, causing the Anthropic SDK to
   report InvalidHTTPResponse. Fixed by adding a sync.Mutex to the
   responseWriter wrapper and serializing all Write/WriteHeader/Flush calls.

2. **MEDIUM — Idle watchdog produces wrong error type** (stream.go):
   After switching from SetReadDeadline to context.Cancel-based idle
   detection, the cancel() call produced context.Canceled on Read(),
   which was not caught by isIdleTimeoutErr(). Added explicit check:
   errors.Is(err, context.Canceled) && clientCtx.Err() == nil → ErrStreamIdle.
   Same fix applied to ProxyStream, ProxyResponsesStream, ProxyGeminiStream,
   and handleAnthropicStreaming (added clientCtx parameter).

3. **MINOR — SSE fast-path content truncation** (stream.go):
   Used strings.Index on raw JSON content to find the terminating quote,
   which stopped at escaped \" inside content. Replaced with a byte walk
   that skips backslash-escaped characters.

4. Added internal/transformer/idle.go with StartIdleWatchdog for
   context-based stream idle detection (extracted from stream.go).

5. **e2e tests**: Added streaming SSE verification tests and a long-stream
   test that exercises the heartbeat path (max_tokens: 500, 120s timeout).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…lback-death-spiral

fix: SSE streaming reliability — heartbeat race, idle watchdog, fast-path, e2e tests
Comment thread internal/handlers/messages.go
Comment thread internal/handlers/messages.go Outdated
Comment thread internal/handlers/messages.go
Comment thread internal/handlers/messages.go
Comment thread scripts/e2e-test.sh
@kilo-code-bot

kilo-code-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Code Review Roast 🔥

Verdict: 1 Critical + 2 Warnings | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 1
⚠️ warning 2
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 644 🔥 The Roast: This line is doing a stellar impression of a car with no engine — it looks like code but won't go anywhere. You deleted obj["model"] = encoded (the entire purpose of this function) and pasted a dangling if that references toolMap, a variable that doesn't exist in this scope. This neither compiles nor replaces the model. It's a ghost town with a "for sale" sign and the foundation bulldozed.
internal/handlers/messages.go 345 🔥 The Roast: Heartbeat keepalive frames call fmt.Fprintf(rw, ":keepalive\n\n") which routes through Write() and unconditionally sets ssePayloadWritten = true. A comment-only :keepalive now marks "payload started," letting the idle watchdog throw a tantrum over nothing after the first heartbeat tick.
internal/handlers/messages.go 410 🔥 The Roast: The catch-all error handler returns true (continue to next model) even when rw.ssePayloadWritten is true. A 500 from upstream can become duplicate message_start chaos wearing a "retry" hat. Your fallback becomes a protocol violator.

🏆 Best part: The previous decodeErrors reset fix at internal/transformer/stream.go:365 is genuinely surgical. I'm shocked.

💀 Worst part: replaceModelInRawBody at line 644 has had its brain removed and replaced with a toolMap phantom. This function no longer does its job, doesn't compile, and the test suite presumably doesn't notice because it's been told to look away.

📊 Overall: A PR that swings from "decent bug fixes" to "broke the only thing that worked" in six lines. Like finding out your dishwasher also filters your air supply — the improvement is… questionable.

Files Reviewed (4 files)
  • internal/handlers/messages.go - 3 issues (1 new critical)
  • internal/router/model_router.go - no change
  • internal/transformer/idle.go - no change
  • internal/transformer/stream.go - 1 issue partially fixed previously

Fix these issues in Kilo Cloud

Previous Review Summaries (8 snapshots, latest commit e112ea1)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit e112ea1)

Verdict: 1 Critical + 2 Warnings | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 1
⚠️ warning 2
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 644 🔥 The Roast: This line is doing a stellar impression of a car with no engine — it looks like code but won't go anywhere. You deleted obj["model"] = encoded (the entire purpose of this function) and pasted a dangling if that references toolMap, a variable that doesn't exist in this scope. This neither compiles nor replaces the model. It's a ghost town with a "for sale" sign and the foundation bulldozed.
internal/handlers/messages.go 345 🔥 The Roast: Heartbeat keepalive frames call fmt.Fprintf(rw, ":keepalive\n\n") which routes through Write() and unconditionally sets ssePayloadWritten = true. A comment-only :keepalive now marks "payload started," letting the idle watchdog throw a tantrum over nothing after the first heartbeat tick.
internal/handlers/messages.go 410 🔥 The Roast: The catch-all error handler returns true (continue to next model) even when rw.ssePayloadWritten is true. A 500 from upstream can become duplicate message_start chaos wearing a "retry" hat. Your fallback becomes a protocol violator.

🏆 Best part: The previous decodeErrors reset fix at internal/transformer/stream.go:365 is genuinely surgical. I'm shocked.

💀 Worst part: replaceModelInRawBody at line 644 has had its brain removed and replaced with a toolMap phantom. This function no longer does its job, doesn't compile, and the test suite presumably doesn't notice because it's been told to look away.

📊 Overall: A PR that swings from "decent bug fixes" to "broke the only thing that worked" in six lines. Like finding out your dishwasher also filters your air supply — the improvement is… questionable.

Files Reviewed (4 files)
  • internal/handlers/messages.go - 3 issues (1 new critical)
  • internal/router/model_router.go - no change
  • internal/transformer/idle.go - no change
  • internal/transformer/stream.go - 1 issue partially fixed previously

Fix these issues in Kilo Cloud

Previous review (commit 868504a)

Verdict: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 0
⚠️ warning 2
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 345 🔥 The Roast: Heartbeat keepalive frames call fmt.Fprintf(rw, ":keepalive\n\n") which routes through Write() and unconditionally sets ssePayloadWritten = true. A comment-only :keepalive now marks "payload started," letting the idle watchdog throw a tantrum over nothing after the first heartbeat tick.
internal/handlers/messages.go 410 🔥 The Roast: The catch-all error handler returns true (continue to next model) even when rw.ssePayloadWritten is true. A 500 from upstream can become duplicate message_start chaos wearing a "retry" hat. Your fallback becomes a protocol violator.

🏆 Best part: The decodeErrors reset fix is surgical — adding the counter reset to the fast-path means the watchdog won't get false positives from interleaved valid chunks. Clean.

💀 Worst part: The heartbeat still lies to ssePayloadWritten. A proxy whose pulse makes the fallback logic think the patient has started bleeding deserves a sharper diagnosis.

📊 Overall: Like finding a horror movie sequel with fewer monsters, but the ones left know where you live.

Files Reviewed (4 files)
  • internal/handlers/messages.go - 2 active issues
  • internal/router/model_router.go - no change, issue resolved previously
  • internal/transformer/idle.go - no change, issue resolved previously
  • internal/transformer/stream.go - 1 issue partially fixed

Fix these issues in Kilo Cloud

Previous review (commit 656e458)

Verdict: 4 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 2
⚠️ warning 2
💡 suggestion 0
🤏 nitpick 0
Active Issues (click to expand)
File Line Roast
internal/handlers/messages.go 345 🔥 The Roast: Heartbeat keepalive frames call fmt.Fprintf(rw, ...) which routes through Write() and unconditionally sets ssePayloadWritten = true. A comment-only :keepalive now marks "payload started," letting the idle watchdog throw a tantrum over nothing after the first heartbeat tick.
internal/handlers/messages.go 403 🔥 The Roast: ErrStreamIdle gets a special guard here, but non-idle streaming errors still hit the catch-all and return true, so fallback can continue after SSE payload has already started. A 500 from upstream can become duplicate message_start chaos wearing a "retry" hat.
internal/transformer/stream.go 361 🔥 The Roast: decodeErrors only resets after JSON parsing; valid fast-path chunks can leave the failure counter haunted by ghosts of malformed frames past. Eventually the stream gets voted off the island for crimes it didn't commit.
internal/transformer/idle.go 60 🔥 The Roast: This watchdog has a timer, but ping() only nudges lastRead without touching the timer itself. On a busy stream the idle window stretches toward 2× the configured timeout, which is less "watchdog" and more "sleeping guard dog."
Resolved Since Last Review (click to expand)
File Line Resolution
internal/handlers/messages.go 113 requestDedup nil guard prevents a panic when deduplication is not configured.
internal/handlers/messages.go 644 sanitizeAnthropicBody now removes only type:"custom" shorthand instead of stripping every tool/content type.
internal/handlers/messages.go 690 ✅ Idle watchdog cancellation is now distinguished from client disconnect in the Anthropic streaming path.
internal/router/model_router.go 171 RouteForStreaming now returns an explicit error when no streaming model is configured instead of hiding behind a zero-value route.
scripts/e2e-test.sh 66 ✅ Proxy startup now backgrounds the process with PID cleanup and health polling, so the test suite can actually start.
Previously Resolved (click to expand)
File Line Resolution
internal/handlers/messages.go 91 Flush holds the mutex across the underlying flush, serializing concurrent Write/WriteHeader/Flush access and eliminating the data race.

🏆 Best part: The model-router fix is actually clean: returning an error for a missing streaming model is the kind of boring correctness that prevents three layers of downstream panic theater.

💀 Worst part: The heartbeat keepalive still lies to ssePayloadWritten. A proxy whose pulse makes the fallback logic think the patient has started bleeding deserves a sharper diagnosis.

📊 Overall: This PR cleaned up several real gremlins, but the remaining critical SSE-state issues still need attention before merge. Like a horror movie sequel: fewer monsters, but the ones left know where you live.

🔗 Fix these issues in Kilo Cloud

Files Reviewed (6 files)
  • internal/handlers/messages.go - 2 active, 4 resolved
  • internal/router/model_router.go - 1 resolved
  • internal/router/model_router_test.go - 0 issues
  • internal/transformer/stream.go - 1 active
  • internal/transformer/idle.go - 1 active
  • scripts/e2e-test.sh - 1 resolved

Previous review (commit 3ce7feb)

Verdict: 4 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 2
⚠️ warning 2
💡 suggestion 0
🤏 nitpick 0
Active Issues (click to expand)
File Line Roast
internal/handlers/messages.go 345 🔥 The Roast: Heartbeat keepalive frames call fmt.Fprintf(rw, ...) which routes through Write() and unconditionally sets ssePayloadWritten = true. A comment-only :keepalive now marks "payload started," letting the idle watchdog throw a tantrum over nothing after the first heartbeat tick.
internal/handlers/messages.go 403 🔥 The Roast: ErrStreamIdle gets a special guard here, but non-idle streaming errors still hit the catch-all and return true, so fallback can continue after SSE payload has already started. A 500 from upstream can become duplicate message_start chaos wearing a "retry" hat.
internal/transformer/stream.go 361 🔥 The Roast: decodeErrors only resets after JSON parsing; valid fast-path chunks can leave the failure counter haunted by ghosts of malformed frames past. Eventually the stream gets voted off the island for crimes it didn't commit.
internal/transformer/idle.go 60 🔥 The Roast: This watchdog has a timer, but ping() only nudges lastRead without touching the timer itself. On a busy stream the idle window stretches toward 2× the configured timeout, which is less "watchdog" and more "sleeping guard dog."
Resolved Since Last Review (click to expand)
File Line Resolution
internal/handlers/messages.go 113 requestDedup nil guard prevents a panic when deduplication is not configured.
internal/handlers/messages.go 644 sanitizeAnthropicBody now removes only type:"custom" shorthand instead of stripping every tool/content type.
internal/handlers/messages.go 690 ✅ Idle watchdog cancellation is now distinguished from client disconnect in the Anthropic streaming path.
internal/router/model_router.go 171 RouteForStreaming now returns an explicit error when no streaming model is configured instead of hiding behind a zero-value route.
scripts/e2e-test.sh 66 ✅ Proxy startup now backgrounds the process with PID cleanup and health polling, so the test suite can actually start.
Previously Resolved (click to expand)
File Line Resolution
internal/handlers/messages.go 91 Flush holds the mutex across the underlying flush, serializing concurrent Write/WriteHeader/Flush access and eliminating the data race.

🏆 Best part: The model-router fix is actually clean: returning an error for a missing streaming model is the kind of boring correctness that prevents three layers of downstream panic theater.

💀 Worst part: The heartbeat keepalive still lies to ssePayloadWritten. A proxy whose pulse makes the fallback logic think the patient has started bleeding deserves a sharper diagnosis.

📊 Overall: This PR cleaned up several real gremlins, but the remaining critical SSE-state issues still need attention before merge. Like a horror movie sequel: fewer monsters, but the ones left know where you live.

🔗 Fix these issues in Kilo Cloud

Files Reviewed (6 files)
  • internal/handlers/messages.go - 2 active, 4 resolved
  • internal/router/model_router.go - 1 resolved
  • internal/router/model_router_test.go - 0 issues
  • internal/transformer/stream.go - 1 active
  • internal/transformer/idle.go - 1 active
  • scripts/e2e-test.sh - 1 resolved

Previous review (commit 8cf917d)

Verdict: 8 Issues Found (1 Fixed, 7 Remaining) | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 2
⚠️ warning 6
💡 suggestion 0
🤏 nitpick 0
Active Issues (click to expand)
File Line Roast
internal/handlers/messages.go 345 🔥 The Roast: Heartbeat keepalive frames call fmt.Fprintf(rw, ...) which routes through Write() and unconditionally sets ssePayloadWritten = true. A simple :keepalive now marks "payload started," letting the idle watchdog throw a tantrum over nothing after the first heartbeat tick.
internal/handlers/messages.go 403 🔥 The Roast: ErrStreamIdle early-falls-back here, but the same non-idle errors hit the catch-all at 468 after ssePayloadWritten might be true. A 500 from upstream becomes a duplicate message_start broadcast masquerading as graceful fallback.
internal/handlers/messages.go 644 🔥 The Roast: This sanitizer deletes every type field from ALL content blocks, including legitimate text, thinking, and tool_use types. Anthropic's schema requires these types; stripping them transforms valid requests into invalid ones.
internal/handlers/messages.go 690 🔥 The Roast: The idle watchdog cancels ctx, and this handler treats context cancellation as client disconnect — conflating a provoked timeout with a real user abort. The error response body still gets written despite the stale channels.
internal/transformer/stream.go 361 🔥 The Roast: decodeErrors only resets after JSON parsing; valid fast-path chunks can leave the failure counter haunted by ghosts of malformed frames past.
internal/transformer/idle.go 60 🔥 The Roast: This watchdog has a timer, but ping() only nudges lastRead without touching the timer itself. On a busy stream the idle window stretches toward 2× the configured timeout, and the watchdog yells at the wrong moment.
internal/router/model_router.go 174 🔥 The Roast: Returning an empty RouteResult on missing streaming model config invites zero-value model attempts downstream. A clear error here prevents a confusing panic three layers deeper.
scripts/e2e-test.sh 66 🔥 The Roast: This starts the proxy in the foreground, the test suite never runs, and somehow this is still the state of e2e infrastructure in an actively reviewed PR.
Resolved Since Last Review (click to expand)
File Line Resolution
internal/handlers/messages.go 91 🔥 Was: Flush released the mutex before calling the underlying Flush, allowing concurrent Write/Flush on *bufio.Writer - a data race. ✅ Now: Mutex is held across f.Flush(), serializing Write/WriteHeader/Flush and eliminating the race. The comment accurately reflects the trade-off (slow TCP blocks writes) instead of hand-waving about TCP ordering.

🏆 Best part: The Flush mutex fix is actually correct. Holding the lock across f.Flush() genuinely serializes *bufio.Writer access and prevents the data race. I'm as shocked as you are.

💀 Worst part: The e2e test harness still insists on starting the proxy in the foreground, making the entire test suite a hostage situation. It's been how many reviews?

📊 Overall: One gremlin caught and fixed, seven still roaming free. Progress, I suppose — like watching paint dry, but at least the color is changing.

Files Reviewed (11 files, 1 changed)
  • internal/handlers/messages.go - 5 active, 1 resolved
  • internal/transformer/stream.go - 1 active
  • internal/transformer/idle.go - 1 active
  • internal/router/model_router.go - 1 active
  • scripts/e2e-test.sh - 1 active

Previous review (commit eb8e122)

Verdict: 7 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 1
⚠️ warning 6
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
scripts/e2e-test.sh 66 The e2e script still starts the proxy in the foreground, so tests wait forever for a server that is busy existing.
internal/handlers/messages.go 93 Flush releases the mutex before the underlying flush, letting writes and flushes waltz together again.
internal/handlers/messages.go 70 Heartbeat keepalive writes mark ssePayloadWritten, so fallback can be blocked even though no real SSE payload started.
internal/handlers/messages.go 416 Non-idle streaming errors still fall back after message_start, risking duplicate message_start events.
internal/transformer/idle.go 60 ping() updates lastRead but never resets the timer, so idle timeout can stretch toward 2x.
internal/router/model_router.go 174 Missing streaming model config returns an empty route instead of an error, inviting zero-value model attempts.
internal/transformer/stream.go 361 decodeErrors only resets after JSON parsing; valid fast-path chunks can leave the failure counter haunted.

🏆 Best part: The nil dedup guard, narrowed sanitizer, and idle-watchdog classification fixes actually landed. Annoyingly, the PR learned some manners and still brought three more gremlins.

💀 Worst part: The e2e harness still blocks before running tests. A streaming reliability PR whose test script is a server-shaped statue is comedy with production consequences.

📊 Overall: Resolved since last review: requestDedup: nil, overbroad sanitizeAnthropicBody, and idle-watchdog misclassification. Remaining: SSE state tracking is still playing hopscotch, the watchdog timer is on mindfulness time, and the e2e script refuses to let the test suite start.

🔗 Fix these issues in Kilo Cloud

Files Reviewed (11 files)
  • internal/client/opencode_test.go - 0 issues
  • internal/config/loader.go - 0 issues
  • internal/handlers/messages.go - 3 issues
  • internal/handlers/messages_test.go - 0 issues
  • internal/router/fallback.go - 0 issues
  • internal/router/model_router.go - 1 issue
  • internal/server/server.go - 0 issues
  • internal/transformer/idle.go - 1 issue
  • internal/transformer/request_test.go - 0 issues
  • internal/transformer/stream.go - 1 issue
  • scripts/e2e-test.sh - 1 carried-forward issue

Previous review (commit 0d9ca7f)

Verdict: 5 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 2
⚠️ warning 3
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 102 requestDedup: nil causes a nil-pointer panic on every request at TryAcquire.
internal/handlers/messages.go 505 Falling back after ProxyStream emits message_start can produce duplicate message_start events in one SSE stream.
internal/handlers/messages.go 642 sanitizeAnthropicBody deletes every tool type, not just Claude Code's custom shorthand.
internal/handlers/messages.go 687 Idle watchdog cancellation is misclassified as client disconnect in handleAnthropicStreaming.
scripts/e2e-test.sh 62 The e2e script starts the server in the foreground and blocks before health checks/tests can run.

🏆 Best part: The mutex-protected responseWriter is a legit fix for the heartbeat race; annoyingly, some code here behaved like it has production experience.

💀 Worst part: requestDedup: nil turns a routine request into an immediate panic. That is a spectacular way to make a reliability PR unreliable.

📊 Overall: The streaming fixes are real, but this PR also smuggled in a runtime panic and a broken e2e harness. Like bringing a fire extinguisher to a fire and also pouring gasoline on the couch.

🔗 Fix these issues in Kilo Cloud

Files Reviewed (20 files)
  • CLAUDE.md - 0 issues
  • cmd/oc-go-cc/main.go - 0 issues
  • internal/client/opencode.go - 0 issues
  • internal/client/opencode_test.go - 0 issues
  • internal/config/config.go - 0 issues
  • internal/config/loader.go - 0 issues
  • internal/config/loader_test.go - 0 issues
  • internal/handlers/messages.go - 4 issues
  • internal/handlers/messages_test.go - 0 issues
  • internal/router/fallback.go - 0 issues
  • internal/router/fallback_test.go - 0 issues
  • internal/router/model_router.go - 0 issues
  • internal/router/model_router_test.go - 0 issues
  • internal/server/server.go - 0 issues
  • internal/transformer/idle.go - 0 issues
  • internal/transformer/request.go - 0 issues
  • internal/transformer/request_test.go - 0 issues
  • internal/transformer/stream.go - 0 issues
  • internal/transformer/stream_test.go - 0 issues
  • scripts/e2e-test.sh - 1 issue

Previous review (commit 71235e3)

Verdict: 5 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 2
⚠️ warning 3
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 102 requestDedup: nil causes a nil-pointer panic on every request at TryAcquire.
internal/handlers/messages.go 505 Falling back after ProxyStream emits message_start can produce duplicate message_start events in one SSE stream.
internal/handlers/messages.go 642 sanitizeAnthropicBody deletes every tool type, not just Claude Code's custom shorthand.
internal/handlers/messages.go 687 Idle watchdog cancellation is misclassified as client disconnect in handleAnthropicStreaming.
scripts/e2e-test.sh 62 The e2e script starts the server in the foreground and blocks before health checks/tests can run.

🏆 Best part: The mutex-protected responseWriter is a legit fix for the heartbeat race; annoyingly, some code here behaved like it has production experience.

💀 Worst part: requestDedup: nil turns a routine request into an immediate panic. That is a spectacular way to make a reliability PR unreliable.

📊 Overall: The streaming fixes are real,

[Snapshot truncated.]

Additional previous summary content was truncated to keep this comment within platform limits.


Reviewed by nex-n2-pro:free · 416,354 tokens

dawn_ll and others added 7 commits June 18, 2026 02:51
requestDedup is intentionally nil when NewMessagesHandler
creates the handler without a deduplicator. Guard the
TryAcquire call with a nil check.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ctx.Done, e2e backgrounding

Five fixes from code review:

1. ssePayloadWritten guards on ErrStreamIdle continue paths:
   ProxyStream emits message_start before knowing the upstream is healthy.
   If ErrStreamIdle fires and we continue to the next model, the client
   would see duplicate message_start events. Guard all 4 streaming
   fallback paths (Zen Anthropic, Responses, Gemini, OpenAI) with
   rw.ssePayloadWritten checks — if SSE output has already started,
   send a stream error and return instead of continuing.

2. Narrow sanitizeAnthropicBody to delete only "type":"custom":
   The sanitizer was deleting every type field from every tool, not
   just the Claude Code "type":"custom" server-tool shorthand.
   Legitimate Anthropic tools that use type (e.g. "type":"function")
   were being stripped. Now only deletes type when the value is "custom".

3. handleAnthropicStreaming ctx.Done() distinguishes watchdog vs client:
   Both the idle watchdog and client disconnect cancel the same context.
   The select case returned ErrClientDisconnected unconditionally.
   Now checks clientCtx.Err() — if client is still connected, the
   cancellation came from the watchdog → return ErrStreamIdle.

4. e2e-test.sh: start proxy with explicit & backgrounding and PID-
   based cleanup. Health check polls /health with 10s timeout instead
   of a blind sleep. Cleanup kills the captured PID for reliable
   teardown in CI.

5. nil guard on requestDedup (committed earlier at c558ae9).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread internal/handlers/messages.go
Comment thread internal/transformer/stream.go
Comment thread internal/transformer/idle.go
Comment thread internal/router/model_router.go
Comment thread internal/handlers/messages.go Outdated
Comment thread internal/handlers/messages.go
Co-authored-by: kilo-code-bot[bot] <240665456+kilo-code-bot[bot]@users.noreply.github.com>
Signed-off-by: TUYIZERE Samuel <tuyizeres0@gmail.com>
Comment thread internal/handlers/messages.go Outdated
@samueltuyizere samueltuyizere merged commit 3b7d987 into routatic:main Jun 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants