fix: stabilize Anthropic-native streaming, timeout handling, and fallback cancellation by hungcuong9125 · Pull Request #80 · routatic/proxy

hungcuong9125 · 2026-06-18T18:32:52Z

Summary

This PR completes the streaming and fallback stabilization work for oc-go-cc, focused on long-running Claude Code sessions routed through OpenCode Go / Zen, especially Anthropic-native models such as minimax-m3 and qwen3.x.

It consolidates the recent fixes from:

b573bc1 — stream Go Anthropic-native models via raw /v1/messages
201aefd — harden transformTools against empty or malformed tool schemas
3a8fef7 — emit tool_use content blocks correctly at index 0
86bb0ff — make timeouts context-driven, stop false fallback on cancellation, and prevent Anthropic raw stream corruption

Problem

We were seeing multiple production issues during real Claude Code usage:

Streaming requests failed after exactly 2 minutes due to a hardcoded timeout.
Non-streaming fallback continued after the client request had already been canceled, causing false 502 all models failed responses.
Long SSE streams could still be terminated locally by shorter global HTTP/server timeouts.
Anthropic raw streaming for Go-provider Anthropic-native models could corrupt SSE output and surface to Claude Code as:
- JSON Parse error: Unexpected EOF
- proxy-side logs such as client disconnected during anthropic stream error="context canceled"

Root Causes

1. Hardcoded streaming timeout

handleStreaming used a hardcoded timeout derived from context.Background(), which:

ignored provider timeout config
ignored client request cancellation
cut off long tool-heavy streams too early

2. Fallback loop ignored request cancellation

ExecuteWithFallback continued trying more models even after the parent request context was already canceled.

That produced:

misleading fallback logs
unnecessary circuit-breaker penalties
false 502 all models failed responses on normal client cancellation

3. Streaming still depended on shorter transport/server limits

Even after fixing handler-level timeout logic, long streams could still be interrupted by:

http.Client.Timeout
http.Server.WriteTimeout

4. Anthropic raw SSE corruption

The proxy heartbeat was writing :keepalive\n\n to the same http.ResponseWriter while Anthropic raw streaming used io.Copy(w, resp.Body).

That allowed keepalive bytes to be injected between partial writes of an upstream SSE event, corrupting the event payload and causing Claude Code JSON parse failures.

What Changed

Added

streaming_timeout_ms support for both opencode_go and opencode_zen
per-provider timeout helpers:
- RequestTimeout(model)
- StreamingTimeout(model)
fallback handler tests for cancellation, deadline, circuit-breaker, and per-model timeout behavior

Fixed

streaming attempts now derive from the original client request context
per-model streaming timeout is configurable instead of hardcoded
non-streaming fallback stops immediately on parent context cancellation
parent cancellation and parent deadline no longer become false 502 all models failed
parent cancellation is no longer recorded as a model failure in circuit-breaker flow
Anthropic raw streaming pauses heartbeat writes during passthrough
response writes are serialized to reduce concurrent writer corruption risk
SSE responses are no longer cut off by a shorter server WriteTimeout
client transport no longer relies on a shorter global http.Client.Timeout

Changed

timeout config reload messaging now reflects immediate effect for:
- OpenCode Go timeout_ms
- OpenCode Go streaming_timeout_ms
- OpenCode Zen timeout_ms
- OpenCode Zen streaming_timeout_ms
example config now documents streaming_timeout_ms
streaming and non-streaming logs now distinguish cancellation from real upstream/model failure more clearly

Refactored

reduced over-detailed comments in touched files
kept JSON-based model replacement helper and trimmed surrounding explanation
preserved previous fixes for:
- Go Anthropic-native raw /v1/messages routing
- transformTools hardening
- tool_use index handling at content block 0

Tests

Added or updated coverage for:

configurable streaming timeout
client-context-derived streaming cancellation
non-streaming parent cancellation without false 502
non-streaming parent deadline exceeded without false 502
per-model timeout fallback
circuit-breaker behavior under cancellation
Go/Zen timeout helper behavior
Anthropic raw stream heartbeat suppression
concurrent response-writer behavior

Validation

Validated with:

go test ./...

Expected behavior after this PR:

no exact 2-minute streaming cutoff
no false all models failed after normal client cancellation
long-running Anthropic-native streaming sessions can continue cleanly
no keepalive injection into raw Anthropic SSE payloads
previous Go-provider Anthropic-native streaming fixes remain intact

Investigation Config

{
  "api_key": "${OC_GO_CC_API_KEY}",
  "host": "127.0.0.1",
  "port": 3456,
  "hot_reload": true,
  "enable_streaming_scenario_routing": false,
  "respect_requested_model": false,
  "models": {
    "background": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-flash",
      "temperature": 0.2,
      "max_tokens": 4096,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      }
    },
    "fast": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-flash",
      "temperature": 0.2,
      "max_tokens": 4096,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      }
    },
    "default": {
      "provider": "opencode-go",
      "model_id": "mimo-v2.5",
      "temperature": 0.3,
      "max_tokens": 8192,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      }
    },
    "think": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-pro",
      "temperature": 0.1,
      "max_tokens": 8192,
      "reasoning_effort": "max",
      "thinking": {
        "type": "enabled"
      }
    },
    "complex": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-pro",
      "temperature": 0.1,
      "max_tokens": 8192,
      "reasoning_effort": "max",
      "thinking": {
        "type": "enabled"
      }
    },
    "long_context": {
      "provider": "opencode-go",
      "model_id": "minimax-m3",
      "temperature": 0.3,
      "max_tokens": 16384,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      },
      "context_threshold": 80000
    }
  },
  "fallbacks": {
    "background": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5",
        "temperature": 0.3,
        "max_tokens": 4096,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "fast": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5",
        "temperature": 0.3,
        "max_tokens": 4096,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "default": [
      {
        "provider": "opencode-go",
        "model_id": "deepseek-v4-flash",
        "temperature": 0.2,
        "max_tokens": 4096,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "deepseek-v4-pro",
        "temperature": 0.1,
        "max_tokens": 8192,
        "reasoning_effort": "max",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "think": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5-pro",
        "temperature": 0.3,
        "max_tokens": 8192,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "minimax-m3",
        "temperature": 0.3,
        "max_tokens": 16384,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "complex": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5-pro",
        "temperature": 0.3,
        "max_tokens": 8192,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "minimax-m3",
        "temperature": 0.3,
        "max_tokens": 16384,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "long_context": [
      {
        "provider": "opencode-go",
        "model_id": "deepseek-v4-pro",
        "temperature": 0.1,
        "max_tokens": 8192,
        "reasoning_effort": "max",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5-pro",
        "temperature": 0.3,
        "max_tokens": 8192,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ]
  },
  "opencode_go": {
    "base_url": "https://opencode.ai/zen/go/v1/chat/completions",
    "anthropic_base_url": "https://opencode.ai/zen/go/v1/messages",
    "timeout_ms": 300000
  },
  "opencode_zen": {
    "base_url": "https://opencode.ai/zen/v1/chat/completions",
    "anthropic_base_url": "https://opencode.ai/zen/v1/messages",
    "responses_base_url": "https://opencode.ai/zen/v1/responses",
    "gemini_base_url": "https://opencode.ai/zen/v1/models",
    "timeout_ms": 300000
  },
  "logging": {
    "level": "info",
    "requests": false
  }
}

Reviewer Notes

Please build and run a final smoke test before merging, especially for the long-context streaming path:

go test ./...
make build

Recommended manual verification:

long Claude Code streaming session with tools=29+
long_context -> minimax-m3
confirm no 2-minute cutoff
confirm no Unexpected EOF
confirm normal client disconnect does not produce false all models failed

The streaming path in handleStreaming was missing a branch for Go provider models that use the Anthropic endpoint (minimax-m3, qwen3.5-plus, qwen3.6-plus, qwen3.7-plus). These requests fell through to the OpenAI transform path, which sent the body to the Anthropic /v1/messages endpoint with OpenAI-formatted tools ({"type":"function","function":{...}}). The MiniMax upstream rejects this with 400 'function name or parameters is empty (2013)'. Mirrors the non-streaming path: send the raw Anthropic body to https://opencode.ai/zen/go/v1/messages, swapping the model field so the routed model (not the user-requested claude-* alias) reaches upstream. Also hardens replaceModelInRawBody: previous string-search implementation only matched the compact form '"model":"<id>"' and silently passed through any whitespace-padded form '"model": "<id>"', which would cause the wrong model to reach upstream. New JSON-based implementation handles both forms and falls back to the original body on parse failure (missing model key, invalid JSON, non-string model value). Adds regression tests: - TestHandleStreaming_GoAnthropicModel_SendsRawAnthropicBody - TestHandleStreaming_GoAnthropicModel_FallsThroughOnError - TestHandleMessages_StreamingMinimaxM3_UsesAnthropicEndpoint - TestHandleNonStreaming_GoAnthropicModel_ReplacesModelInBody - TestReplaceModelInRawBody_{JSONBased,HandlesWhitespace, ReturnsOriginalWhenModelMissing,ReturnsOriginalOnInvalidJSON, HandlesNestedObjects} Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

OpenAI-compatible upstreams (kimi, glm, mimo, qwen) reject requests whose tools[].function.parameters is an empty object or missing required JSON Schema fields. The previous transformTools emitted parameters: {} for any tool with an empty or absent input_schema, and forwarded malformed schemas verbatim when JSON parsing failed. New transformTools: - Skips tools whose name is empty or whitespace-only - For empty / null / '{}' input_schema, emits the canonical {"type":"object","properties":{},"additionalProperties":false} - For valid schemas missing 'type' or 'properties', adds them defensively (matches what most upstreams expect) - For invalid JSON schemas, falls back to the canonical default rather than forwarding broken bytes - Preserves additionalProperties when the caller explicitly set it Adds regression tests: - TestTransformTools_SkipsEmptyName - TestTransformTools_SkipsWhitespaceOnlyName - TestTransformTools_FillsEmptySchema - TestTransformTools_FillsNullSchema - TestTransformTools_FillsEmptyObjectSchema - TestTransformTools_FillsMissingType - TestTransformTools_FillsMissingProperties - TestTransformTools_RecoversFromInvalidJSON - TestTransformTools_PreservesValidSchema - TestTransformTools_PreservesAdditionalPropertiesWhenSet Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

When the assistant's first output is a direct tool call (no preceding text or reasoning), the previous code emitted content_block_start with index 1 because *contentIndex++ ran before blockIdx = *contentIndex. Anthropic's SSE spec requires content block indices to start at 0 and be sequential; an index-1 block without an index-0 block causes the SDK to reject the response with InvalidHTTPResponse. The fix in processSSELine: - When closing an open text/reasoning block before starting a tool_use, advance contentIndex past the closed block first - Then set blockIdx = *contentIndex (the next free index) - Then advance contentIndex again to reserve for the new block For the first-block case (no preceding text), the if-block is skipped, blockIdx is the current *contentIndex (0), and the new tool_use lands at index 0 as required. Existing tests TestProxyStream_SingleToolCall and TestProxyStream_ToolCallAndFinishReasonInSameChunk are updated to assert the new index-0 behavior. Adds a dedicated TestProxyStream_ToolUseFirstContentBlock that exercises the previously-broken path end-to-end. Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Added - provider-level streaming timeout configuration for OpenCode Go and OpenCode Zen - request timeout helpers for per-model request and streaming attempts - fallback handler coverage for cancellation, deadline, and timeout behavior Fixed - streaming requests now derive attempt timeouts from the client request context - non-streaming fallback stops cleanly on parent cancellation instead of returning false 502 all models failed responses - long-running SSE responses are no longer capped by a shorter server write timeout or global http client timeout - Anthropic raw streaming pauses proxy keepalives and serializes writes to avoid SSE corruption and JSON parse EOF failures - hot reload timeout warnings now reflect immediate effect for Go and Zen timeout updates Changed - example config documents streaming_timeout_ms for both upstream providers - streaming and non-streaming handlers now distinguish client cancellation from upstream model failure in logs Refactored - shortened over-detailed comments in touched handlers, transformer, client, and config code - replaceModelInRawBody keeps using JSON-based replacement with a smaller surface explanation Tests - added timeout helper tests for Go and Zen providers - added fallback tests for canceled contexts, parent deadlines, circuit-breaker accounting, and per-model timeout fallback - added handler tests for configurable streaming timeout, cancellation-aware fallback, Anthropic raw passthrough, and concurrent writer behavior - kept full go test ./... green before commit

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86bb0ff45e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-18T18:39:26Z

+				if heartbeatPaused.Load() == 1 {
+					continue


Flush raw Anthropic streams while suppressing heartbeats

When the selected model uses the raw Anthropic path, handleAnthropicStreaming just io.Copys resp.Body into rw and never calls Flush; before this change the periodic heartbeat still flushed the underlying writer every 3s. With this new continue, those ticks do nothing while raw passthrough is active, so normal small SSE events can remain buffered by net/http until the buffer fills or the handler returns, making native Anthropic streaming appear hung and risking downstream idle timeouts. Suppress injected bytes, but still flush after copied chunks or with a flush-only ticker.

Useful? React with 👍 / 👎.

kilo-code-bot · 2026-06-18T18:40:36Z

-	// Start heartbeat
 	var finished int32
 	heartbeatDone := make(chan struct{})
+	heartbeatPaused := new(atomic.Int32)


🔥 The Roast: Pausing the heartbeat is the right instinct, but this flag only guards raw Anthropic io.Copy; any future raw passthrough can trip the same landmine.

🩹 The Fix: Wrap any raw stream copy with a helper that pauses heartbeat for the whole copy, instead of manually setting the flag around one endpoint.

📏 Severity: suggestion

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

kilo-code-bot · 2026-06-18T18:41:30Z

+	defer w.mu.Unlock()
 	if !w.wroteHeader {
-		w.WriteHeader(http.StatusOK)
+		w.wroteHeader = true


🔥 The Roast: Setting wroteHeader under the mutex is a good start, but later reads still peek at it bare-handed. The heartbeat goroutine can update this flag while handleStreaming checks it, so the race detector gets a front-row seat.

🩹 The Fix: Add a locked helper like HasWrittenHeader() on responseWriter, use it for the later checks, and keep direct wroteHeader access out of concurrent paths.

📏 Severity: warning

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

kilo-code-bot · 2026-06-18T18:41:51Z

+			schema = []byte(`{"type":"object","properties":{},"additionalProperties":false}`)
+		default:
+			var schemaObj map[string]interface{}
+			if err := json.Unmarshal(schema, &schemaObj); err != nil {


🔥 The Roast: This catches invalid JSON, but valid JSON with the wrong shape can still waltz through. A schema like {"type":"string"} or {"properties":[]} will unmarshal happily and get forwarded as tool parameters that OpenAI can reject.

🩹 The Fix: After unmarshalling, validate that type is "object" and properties is an object; otherwise replace with the safe empty-object schema instead of preserving the bad shape.

📏 Severity: suggestion

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

kilo-code-bot · 2026-06-18T18:42:26Z

Code Review Roast 🔥

Verdict: Previous Issues Resolved | Recommendation: No new concerns in incremental diff

Overview

Severity	Count
🚨 critical	0
⚠️ warning	0
💡 suggestion	0
🤏 nitpick	0

Issue Details (click to expand)

File	Previous Issue	Status
`internal/handlers/messages.go`	Streaming timeout only covered request startup; mid-stream stalls could ignore per-model deadline	✅ Fixed — upstream body now wrapped with `NewCtxReadCloser(attemptCtx, ...)` so `streaming_timeout_ms` aborts reads mid-stream
`internal/transformer/request.go`	Whitespace-padded JSON `" null "` unmarshaled to nil map, causing nil-pointer panic on field assignments	✅ Fixed — `schemaObj == nil` guard added, falls back to default object schema safely

🏆 Best part: Holy shit, this PR actually fixed the timeout loophole properly. I need to sit down. The ctxio wrapper is clean, well-tested, and exactly what should have existed from day one. I'm genuinely shocked — I had my sarcasm fully loaded and I don't even get to use it.

💀 Worst part: Nothing. For real. I've been looking for something to roast and the only viable target is the fact that the fix is obvious in retrospect, which means the original time-bomb was avoidable. But that's on the old code, not this one.

📊 Overall: Like finding out the guy who built a Death Star actually came back and installed a proper fire suppression system. Rare, welcome, and slightly suspicious that it took a near-miss to make it happen.

Files Reviewed (6 files)

internal/handlers/messages.go — previous issues fixed
internal/transformer/ctxio.go — new, clean
internal/transformer/ctxio_test.go — new, thorough
internal/transformer/request.go — previous issue fixed
internal/transformer/request_test.go — regression test added
internal/transformer/stream.go — parameter rename consistent

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Previous Review Summaries (3 snapshots, latest commit d6b4cf9)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit `d6b4cf9`)

Verdict: 2 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
🚨 critical	0
⚠️ warning	2
💡 suggestion	0
🤏 nitpick	0

Issue Details (click to expand)

File	Line	Roast
`internal/handlers/messages.go`	427	`StreamingTimeoutMs` only covers request startup; `ProxyStream` still reads on `clientCtx`, so a stalled mid-stream response can ignore the per-model timeout. Same gap applies to raw Anthropic `io.Copy` and Responses/Gemini streaming paths.
`internal/transformer/request.go`	605	Valid JSON like whitespace-padded `null` can leave `schemaObj` nil, then the properties assignment panics instead of falling back safely.

🏆 Best part: The cancellation handling and circuit-breaker accounting are genuinely solid; I hate admitting that because it makes the remaining timeout gap look even more avoidable.

💀 Worst part: A "per-model streaming timeout" that stops caring once headers arrive is the production incident version of a smoke alarm that only works before the fire starts.

📊 Overall: Strong stabilization PR with two sharp edges left: one timeout loophole and one nil-map panic waiting in the schema sanitizer.

Fix these issues in Kilo Cloud: https://app.kilo.ai/cloud-agent-fork/review/c6dadec2-4873-4951-8107-e9c03c9c34f6

Files Reviewed (16 files)

CONFIGURATION.md - no issues
README.md - no issues
configs/config.example.json - no issues
internal/client/opencode.go - no issues
internal/client/opencode_test.go - no issues
internal/config/atomic.go - no issues
internal/config/config.go - no issues
internal/handlers/messages.go - 1 issue
internal/handlers/messages_test.go - no issues
internal/router/fallback.go - no issues
internal/router/fallback_test.go - no issues
internal/server/server.go - no issues
internal/transformer/request.go - 1 issue
internal/transformer/request_test.go - no issues
internal/transformer/stream.go - no issues
internal/transformer/stream_test.go - no issues

Previous review

Verdict: 0 Issues Found | Recommendation: Merge

All previous issues have been addressed in this PR.

Fixes Applied

File	Line	Previous Issue	Resolution
`internal/handlers/messages.go`	61	`wroteHeader` race with heartbeat goroutine	Added mutex-protected `HasWrittenHeader()` method (line 70-74)
`internal/handlers/messages.go`	303	Heartbeat injection during raw Anthropic stream	Heartbeat now paused via `heartbeatPaused` flag in `handleAnthropicStreaming` (line 580)
`internal/transformer/request.go`	588	Schema validation only checked JSON validity	Added validation that `type` is "object" and `properties` is an object (lines 591-607)

🏆 Best part: The circuit breaker logic now correctly distinguishes between client-initiated cancellations (which don't poison the breaker) and real upstream failures. This is the kind of production hygiene that keeps incidents from snowballing.

💀 Worst part: None — the sharpest splinters have been sanded down. Even the SSE event index was rejiggered to start tool_use blocks at 0 as the spec demands.

📊 Overall: Like a master craftsman's final pass — the rough edges are gone and the whole thing gleams. The tests added here would make the previous reviewer blush with pride.

Files Reviewed (8 files)

internal/handlers/messages.go - 0 issues (3 fixed)
internal/handlers/messages_test.go - no issues (new tests)
internal/router/fallback.go - 0 issues (context cancellation added)
internal/router/fallback_test.go - no issues (new tests)
internal/server/server.go - no issues (WriteTimeout change)
internal/transformer/request.go - 0 issues (schema validation fixed)
internal/transformer/request_test.go - no issues (new tests)
internal/transformer/stream.go - no issues (index fix verified)
internal/transformer/stream_test.go - no issues (updated tests)

Previous review (commit `86bb0ff`)

Verdict: 3 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
🚨 critical	0
⚠️ warning	1
💡 suggestion	2
🤏 nitpick	0

Issue Details (click to expand)

File	Line	Roast
`internal/handlers/messages.go`	277	Heartbeat pause is scoped only around raw Anthropic copying, so future raw passthrough paths can trip the same injection hazard.
`internal/handlers/messages.go`	61	`wroteHeader` is written under a mutex but read later without locking, creating a data race with the heartbeat goroutine.
`internal/transformer/request.go`	588	Valid JSON schemas with the wrong shape can still pass through as OpenAI tool parameters and get rejected upstream.

🏆 Best part: The cancellation handling is genuinely tidy: fallback stops on parent cancellation and avoids poisoning circuit breakers with client disconnects, which is the kind of production hygiene I wish I saw more often.

💀 Worst part: The response writer race is the sharpest splinter here; you built a mutex moat, then left wroteHeader as an unlocked drawbridge.

📊 Overall: Strong production fix with a few concurrency and validation potholes — like a sports car with excellent brakes and one suspicious tire.

Fix these issues in Kilo Cloud: https://app.kilo.ai/cloud-agent-fork/review/0f34b0d6-52cb-4b57-9ae1-e392a3b697eb

Files Reviewed (14 files)

configs/config.example.json
internal/client/opencode.go
internal/client/opencode_test.go
internal/config/atomic.go
internal/config/config.go
internal/handlers/messages.go
internal/handlers/messages_test.go
internal/router/fallback.go
internal/router/fallback_test.go
internal/server/server.go
internal/transformer/request.go
internal/transformer/request_test.go
internal/transformer/stream.go
internal/transformer/stream_test.go

_{Reviewed by step-3.7-flash-20260528 · 513,444 tokens}

…m flushing, heartbeat scoping Fixed - Data race on responseWriter.wroteHeader: add mutex lock in WriteHeader and a locked HasWrittenHeader() accessor for concurrent reads. - Tool schema shape validation: verify "type" is "object" and "properties" is an object before forwarding to upstream OpenAI-compatible endpoints. - Raw Anthropic SSE stream buffering: wrap io.Copy with a flushWriter that calls Flush after every write, preventing hung streams during heartbeat suppression. Refactored - Scoped heartbeat pause into handleAnthropicStreaming itself (Store(1) / defer Store(0)), removing duplicated Store calls from both call sites so future raw passthrough paths cannot forget to suppress heartbeats. Co-Authored-By: kilocode-review[bot] <kilocode-review[bot]@users.noreply.github.com> Co-Authored-By: ChatGPT-Codex-Connector[bot] <chatgpt-codex-connector[bot]@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…flows Claude Code review workflows can dispatch many subagents and tool calls while still using streaming requests. When enable_streaming_scenario_routing is disabled, these requests may be routed to the fast scenario, even if they are tool-heavy. This can cause unreliable behavior with fast models during parallel Agent/tool orchestration. Document that users should enable enable_streaming_scenario_routing for /review-code, /ultracode, multi-agent dispatch, large diffs, and long-context review sessions so streaming requests can route to complex or long_context models.

kilo-code-bot · 2026-06-19T07:20:22Z

 		}

-		streamBody, err := h.client.GetStreamingBody(ctx, model.ModelID, openaiReq, model)
+		streamBody, err := h.client.GetStreamingBody(attemptCtx, model.ModelID, openaiReq, model)


🔥 The Roast: StreamingTimeoutMs is doing a great impression of a timeout until the response headers arrive; after GetStreamingBody returns, ProxyStream is still reading on clientCtx, so a mid-stream stall can sit there forever like a guest who ignores the last-call bell.

🩹 The Fix: Bind the stream-body read to the per-model attempt context, not just request startup. Pass attemptCtx (or a context-bound reader/wrapper) into ProxyStream, and apply the same treatment to the raw Anthropic io.Copy plus Responses/Gemini streaming paths.

📏 Severity: warning

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

…l schema Two warnings from kilo-code-bot: 1. StreamingTimeoutMs only guards request startup; once GetStreamingBody returns, the body read was tied to the request context (no timeout), so a mid-stream stall could sit forever. Pass the per-model attempt context into ProxyStream/ProxyResponsesStream/ProxyGeminiStream and the raw Anthropic io.Copy, and wrap the upstream body with a tiny ctxio.NewCtxReadCloser so the body Read also respects the deadline. 2. transformTools panicked on valid JSON that unmarshals to a nil map (e.g. " null " with decorative whitespace). Treat that case the same as a successful parse of "{}" — fall back to the default schema. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

samueltuyizere · 2026-06-19T11:39:55Z

Thanks a lot for working on this @hungcuong9125 , let me know if you need help with these merge conflicts since I introduced most of them and I'd be happy to fix them.

jremen · 2026-06-19T18:18:59Z

@hungcuong9125
Will this PR resolve this issue?
#62

hungcuong9125 and others added 4 commits June 18, 2026 21:13

chatgpt-codex-connector Bot reviewed Jun 18, 2026

View reviewed changes

kilo-code-bot Bot reviewed Jun 18, 2026

View reviewed changes

hungcuong9125 marked this pull request as draft June 18, 2026 22:30

hungcuong9125 marked this pull request as ready for review June 19, 2026 07:09

kilo-code-bot Bot reviewed Jun 19, 2026

View reviewed changes

hungcuong9125 marked this pull request as draft June 19, 2026 08:58

Uh oh!

Conversation

hungcuong9125 commented Jun 18, 2026

Summary

Problem

Root Causes

1. Hardcoded streaming timeout

2. Fallback loop ignored request cancellation

3. Streaming still depended on shorter transport/server limits

4. Anthropic raw SSE corruption

What Changed

Added

Fixed

Changed

Refactored

Tests

Validation

Investigation Config

Reviewer Notes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot Jun 18, 2026

Choose a reason for hiding this comment

📏 Severity: suggestion

Uh oh!

kilo-code-bot Bot Jun 18, 2026

Choose a reason for hiding this comment

📏 Severity: warning

Uh oh!

kilo-code-bot Bot Jun 18, 2026

Choose a reason for hiding this comment

📏 Severity: suggestion

Uh oh!

kilo-code-bot Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Roast 🔥

Overview

Previous review (commit d6b4cf9)

Overview

Previous review

Fixes Applied

Previous review (commit 86bb0ff)

Overview

Uh oh!

kilo-code-bot Bot Jun 19, 2026

Choose a reason for hiding this comment

📏 Severity: warning

Uh oh!

Uh oh!

samueltuyizere commented Jun 19, 2026

Uh oh!

jremen commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kilo-code-bot Bot commented Jun 18, 2026 •

edited

Loading

Previous review (commit `d6b4cf9`)

Previous review (commit `86bb0ff`)