Skip to content

fix: stabilize Anthropic-native streaming, timeout handling, and fallback cancellation#80

Draft
hungcuong9125 wants to merge 7 commits into
routatic:mainfrom
hungcuong9125:fix/streaming-fallback-anthropic-raw
Draft

fix: stabilize Anthropic-native streaming, timeout handling, and fallback cancellation#80
hungcuong9125 wants to merge 7 commits into
routatic:mainfrom
hungcuong9125:fix/streaming-fallback-anthropic-raw

Conversation

@hungcuong9125

Copy link
Copy Markdown

Summary

This PR completes the streaming and fallback stabilization work for oc-go-cc, focused on long-running Claude Code sessions routed through OpenCode Go / Zen, especially Anthropic-native models such as minimax-m3 and qwen3.x.

It consolidates the recent fixes from:

  • b573bc1 — stream Go Anthropic-native models via raw /v1/messages
  • 201aefd — harden transformTools against empty or malformed tool schemas
  • 3a8fef7 — emit tool_use content blocks correctly at index 0
  • 86bb0ff — make timeouts context-driven, stop false fallback on cancellation, and prevent Anthropic raw stream corruption

Problem

We were seeing multiple production issues during real Claude Code usage:

  1. Streaming requests failed after exactly 2 minutes due to a hardcoded timeout.
  2. Non-streaming fallback continued after the client request had already been canceled, causing false 502 all models failed responses.
  3. Long SSE streams could still be terminated locally by shorter global HTTP/server timeouts.
  4. Anthropic raw streaming for Go-provider Anthropic-native models could corrupt SSE output and surface to Claude Code as:
    • JSON Parse error: Unexpected EOF
    • proxy-side logs such as client disconnected during anthropic stream error="context canceled"

Root Causes

1. Hardcoded streaming timeout

handleStreaming used a hardcoded timeout derived from context.Background(), which:

  • ignored provider timeout config
  • ignored client request cancellation
  • cut off long tool-heavy streams too early

2. Fallback loop ignored request cancellation

ExecuteWithFallback continued trying more models even after the parent request context was already canceled.

That produced:

  • misleading fallback logs
  • unnecessary circuit-breaker penalties
  • false 502 all models failed responses on normal client cancellation

3. Streaming still depended on shorter transport/server limits

Even after fixing handler-level timeout logic, long streams could still be interrupted by:

  • http.Client.Timeout
  • http.Server.WriteTimeout

4. Anthropic raw SSE corruption

The proxy heartbeat was writing :keepalive\n\n to the same http.ResponseWriter while Anthropic raw streaming used io.Copy(w, resp.Body).

That allowed keepalive bytes to be injected between partial writes of an upstream SSE event, corrupting the event payload and causing Claude Code JSON parse failures.

What Changed

Added

  • streaming_timeout_ms support for both opencode_go and opencode_zen
  • per-provider timeout helpers:
    • RequestTimeout(model)
    • StreamingTimeout(model)
  • fallback handler tests for cancellation, deadline, circuit-breaker, and per-model timeout behavior

Fixed

  • streaming attempts now derive from the original client request context
  • per-model streaming timeout is configurable instead of hardcoded
  • non-streaming fallback stops immediately on parent context cancellation
  • parent cancellation and parent deadline no longer become false 502 all models failed
  • parent cancellation is no longer recorded as a model failure in circuit-breaker flow
  • Anthropic raw streaming pauses heartbeat writes during passthrough
  • response writes are serialized to reduce concurrent writer corruption risk
  • SSE responses are no longer cut off by a shorter server WriteTimeout
  • client transport no longer relies on a shorter global http.Client.Timeout

Changed

  • timeout config reload messaging now reflects immediate effect for:
    • OpenCode Go timeout_ms
    • OpenCode Go streaming_timeout_ms
    • OpenCode Zen timeout_ms
    • OpenCode Zen streaming_timeout_ms
  • example config now documents streaming_timeout_ms
  • streaming and non-streaming logs now distinguish cancellation from real upstream/model failure more clearly

Refactored

  • reduced over-detailed comments in touched files
  • kept JSON-based model replacement helper and trimmed surrounding explanation
  • preserved previous fixes for:
    • Go Anthropic-native raw /v1/messages routing
    • transformTools hardening
    • tool_use index handling at content block 0

Tests

Added or updated coverage for:

  • configurable streaming timeout
  • client-context-derived streaming cancellation
  • non-streaming parent cancellation without false 502
  • non-streaming parent deadline exceeded without false 502
  • per-model timeout fallback
  • circuit-breaker behavior under cancellation
  • Go/Zen timeout helper behavior
  • Anthropic raw stream heartbeat suppression
  • concurrent response-writer behavior

Validation

Validated with:

go test ./...

Expected behavior after this PR:

  • no exact 2-minute streaming cutoff
  • no false all models failed after normal client cancellation
  • long-running Anthropic-native streaming sessions can continue cleanly
  • no keepalive injection into raw Anthropic SSE payloads
  • previous Go-provider Anthropic-native streaming fixes remain intact

Investigation Config

{
  "api_key": "${OC_GO_CC_API_KEY}",
  "host": "127.0.0.1",
  "port": 3456,
  "hot_reload": true,
  "enable_streaming_scenario_routing": false,
  "respect_requested_model": false,
  "models": {
    "background": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-flash",
      "temperature": 0.2,
      "max_tokens": 4096,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      }
    },
    "fast": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-flash",
      "temperature": 0.2,
      "max_tokens": 4096,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      }
    },
    "default": {
      "provider": "opencode-go",
      "model_id": "mimo-v2.5",
      "temperature": 0.3,
      "max_tokens": 8192,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      }
    },
    "think": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-pro",
      "temperature": 0.1,
      "max_tokens": 8192,
      "reasoning_effort": "max",
      "thinking": {
        "type": "enabled"
      }
    },
    "complex": {
      "provider": "opencode-go",
      "model_id": "deepseek-v4-pro",
      "temperature": 0.1,
      "max_tokens": 8192,
      "reasoning_effort": "max",
      "thinking": {
        "type": "enabled"
      }
    },
    "long_context": {
      "provider": "opencode-go",
      "model_id": "minimax-m3",
      "temperature": 0.3,
      "max_tokens": 16384,
      "reasoning_effort": "high",
      "thinking": {
        "type": "enabled"
      },
      "context_threshold": 80000
    }
  },
  "fallbacks": {
    "background": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5",
        "temperature": 0.3,
        "max_tokens": 4096,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "fast": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5",
        "temperature": 0.3,
        "max_tokens": 4096,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "default": [
      {
        "provider": "opencode-go",
        "model_id": "deepseek-v4-flash",
        "temperature": 0.2,
        "max_tokens": 4096,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "deepseek-v4-pro",
        "temperature": 0.1,
        "max_tokens": 8192,
        "reasoning_effort": "max",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "think": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5-pro",
        "temperature": 0.3,
        "max_tokens": 8192,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "minimax-m3",
        "temperature": 0.3,
        "max_tokens": 16384,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "complex": [
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5-pro",
        "temperature": 0.3,
        "max_tokens": 8192,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "minimax-m3",
        "temperature": 0.3,
        "max_tokens": 16384,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ],
    "long_context": [
      {
        "provider": "opencode-go",
        "model_id": "deepseek-v4-pro",
        "temperature": 0.1,
        "max_tokens": 8192,
        "reasoning_effort": "max",
        "thinking": {
          "type": "enabled"
        }
      },
      {
        "provider": "opencode-go",
        "model_id": "mimo-v2.5-pro",
        "temperature": 0.3,
        "max_tokens": 8192,
        "reasoning_effort": "high",
        "thinking": {
          "type": "enabled"
        }
      }
    ]
  },
  "opencode_go": {
    "base_url": "https://opencode.ai/zen/go/v1/chat/completions",
    "anthropic_base_url": "https://opencode.ai/zen/go/v1/messages",
    "timeout_ms": 300000
  },
  "opencode_zen": {
    "base_url": "https://opencode.ai/zen/v1/chat/completions",
    "anthropic_base_url": "https://opencode.ai/zen/v1/messages",
    "responses_base_url": "https://opencode.ai/zen/v1/responses",
    "gemini_base_url": "https://opencode.ai/zen/v1/models",
    "timeout_ms": 300000
  },
  "logging": {
    "level": "info",
    "requests": false
  }
}

Reviewer Notes

Please build and run a final smoke test before merging, especially for the long-context streaming path:

go test ./...
make build

Recommended manual verification:

  • long Claude Code streaming session with tools=29+
  • long_context -> minimax-m3
  • confirm no 2-minute cutoff
  • confirm no Unexpected EOF
  • confirm normal client disconnect does not produce false all models failed

hungcuong9125 and others added 4 commits June 18, 2026 21:13
The streaming path in handleStreaming was missing a branch for Go
provider models that use the Anthropic endpoint (minimax-m3,
qwen3.5-plus, qwen3.6-plus, qwen3.7-plus). These requests fell
through to the OpenAI transform path, which sent the body to
the Anthropic /v1/messages endpoint with OpenAI-formatted tools
({"type":"function","function":{...}}). The MiniMax upstream
rejects this with 400 'function name or parameters is empty (2013)'.

Mirrors the non-streaming path: send the raw Anthropic body to
https://opencode.ai/zen/go/v1/messages, swapping the model field
so the routed model (not the user-requested claude-* alias) reaches
upstream.

Also hardens replaceModelInRawBody: previous string-search
implementation only matched the compact form '"model":"<id>"'
and silently passed through any whitespace-padded form
'"model": "<id>"', which would cause the wrong model to reach
upstream. New JSON-based implementation handles both forms and
falls back to the original body on parse failure (missing model
key, invalid JSON, non-string model value).

Adds regression tests:
- TestHandleStreaming_GoAnthropicModel_SendsRawAnthropicBody
- TestHandleStreaming_GoAnthropicModel_FallsThroughOnError
- TestHandleMessages_StreamingMinimaxM3_UsesAnthropicEndpoint
- TestHandleNonStreaming_GoAnthropicModel_ReplacesModelInBody
- TestReplaceModelInRawBody_{JSONBased,HandlesWhitespace,
  ReturnsOriginalWhenModelMissing,ReturnsOriginalOnInvalidJSON,
  HandlesNestedObjects}

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
OpenAI-compatible upstreams (kimi, glm, mimo, qwen) reject requests
whose tools[].function.parameters is an empty object or missing
required JSON Schema fields. The previous transformTools emitted
parameters: {} for any tool with an empty or absent input_schema,
and forwarded malformed schemas verbatim when JSON parsing failed.

New transformTools:
- Skips tools whose name is empty or whitespace-only
- For empty / null / '{}' input_schema, emits the canonical
  {"type":"object","properties":{},"additionalProperties":false}
- For valid schemas missing 'type' or 'properties', adds them
  defensively (matches what most upstreams expect)
- For invalid JSON schemas, falls back to the canonical default
  rather than forwarding broken bytes
- Preserves additionalProperties when the caller explicitly set it

Adds regression tests:
- TestTransformTools_SkipsEmptyName
- TestTransformTools_SkipsWhitespaceOnlyName
- TestTransformTools_FillsEmptySchema
- TestTransformTools_FillsNullSchema
- TestTransformTools_FillsEmptyObjectSchema
- TestTransformTools_FillsMissingType
- TestTransformTools_FillsMissingProperties
- TestTransformTools_RecoversFromInvalidJSON
- TestTransformTools_PreservesValidSchema
- TestTransformTools_PreservesAdditionalPropertiesWhenSet

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
When the assistant's first output is a direct tool call (no
preceding text or reasoning), the previous code emitted
content_block_start with index 1 because *contentIndex++ ran
before blockIdx = *contentIndex. Anthropic's SSE spec requires
content block indices to start at 0 and be sequential; an
index-1 block without an index-0 block causes the SDK to
reject the response with InvalidHTTPResponse.

The fix in processSSELine:
- When closing an open text/reasoning block before starting a
  tool_use, advance contentIndex past the closed block first
- Then set blockIdx = *contentIndex (the next free index)
- Then advance contentIndex again to reserve for the new block

For the first-block case (no preceding text), the if-block is
skipped, blockIdx is the current *contentIndex (0), and the
new tool_use lands at index 0 as required.

Existing tests TestProxyStream_SingleToolCall and
TestProxyStream_ToolCallAndFinishReasonInSameChunk are updated
to assert the new index-0 behavior. Adds a dedicated
TestProxyStream_ToolUseFirstContentBlock that exercises the
previously-broken path end-to-end.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
Added
- provider-level streaming timeout configuration for OpenCode Go and OpenCode Zen
- request timeout helpers for per-model request and streaming attempts
- fallback handler coverage for cancellation, deadline, and timeout behavior

Fixed
- streaming requests now derive attempt timeouts from the client request context
- non-streaming fallback stops cleanly on parent cancellation instead of returning false 502 all models failed responses
- long-running SSE responses are no longer capped by a shorter server write timeout or global http client timeout
- Anthropic raw streaming pauses proxy keepalives and serializes writes to avoid SSE corruption and JSON parse EOF failures
- hot reload timeout warnings now reflect immediate effect for Go and Zen timeout updates

Changed
- example config documents streaming_timeout_ms for both upstream providers
- streaming and non-streaming handlers now distinguish client cancellation from upstream model failure in logs

Refactored
- shortened over-detailed comments in touched handlers, transformer, client, and config code
- replaceModelInRawBody keeps using JSON-based replacement with a smaller surface explanation

Tests
- added timeout helper tests for Go and Zen providers
- added fallback tests for canceled contexts, parent deadlines, circuit-breaker accounting, and per-model timeout fallback
- added handler tests for configurable streaming timeout, cancellation-aware fallback, Anthropic raw passthrough, and concurrent writer behavior
- kept full go test ./... green before commit

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86bb0ff45e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +288 to +289
if heartbeatPaused.Load() == 1 {
continue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Flush raw Anthropic streams while suppressing heartbeats

When the selected model uses the raw Anthropic path, handleAnthropicStreaming just io.Copys resp.Body into rw and never calls Flush; before this change the periodic heartbeat still flushed the underlying writer every 3s. With this new continue, those ticks do nothing while raw passthrough is active, so normal small SSE events can remain buffered by net/http until the buffer fills or the handler returns, making native Anthropic streaming appear hung and risking downstream idle timeouts. Suppress injected bytes, but still flush after copied chunks or with a flush-only ticker.

Useful? React with 👍 / 👎.

// Start heartbeat
var finished int32
heartbeatDone := make(chan struct{})
heartbeatPaused := new(atomic.Int32)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Pausing the heartbeat is the right instinct, but this flag only guards raw Anthropic io.Copy; any future raw passthrough can trip the same landmine.

🩹 The Fix: Wrap any raw stream copy with a helper that pauses heartbeat for the whole copy, instead of manually setting the flag around one endpoint.

📏 Severity: suggestion

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

defer w.mu.Unlock()
if !w.wroteHeader {
w.WriteHeader(http.StatusOK)
w.wroteHeader = true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Setting wroteHeader under the mutex is a good start, but later reads still peek at it bare-handed. The heartbeat goroutine can update this flag while handleStreaming checks it, so the race detector gets a front-row seat.

🩹 The Fix: Add a locked helper like HasWrittenHeader() on responseWriter, use it for the later checks, and keep direct wroteHeader access out of concurrent paths.

📏 Severity: warning

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

schema = []byte(`{"type":"object","properties":{},"additionalProperties":false}`)
default:
var schemaObj map[string]interface{}
if err := json.Unmarshal(schema, &schemaObj); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: This catches invalid JSON, but valid JSON with the wrong shape can still waltz through. A schema like {"type":"string"} or {"properties":[]} will unmarshal happily and get forwarded as tool parameters that OpenAI can reject.

🩹 The Fix: After unmarshalling, validate that type is "object" and properties is an object; otherwise replace with the safe empty-object schema instead of preserving the bad shape.

📏 Severity: suggestion

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

@kilo-code-bot

kilo-code-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Code Review Roast 🔥

Verdict: Previous Issues Resolved | Recommendation: No new concerns in incremental diff

Overview

Severity Count
🚨 critical 0
⚠️ warning 0
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Previous Issue Status
internal/handlers/messages.go Streaming timeout only covered request startup; mid-stream stalls could ignore per-model deadline ✅ Fixed — upstream body now wrapped with NewCtxReadCloser(attemptCtx, ...) so streaming_timeout_ms aborts reads mid-stream
internal/transformer/request.go Whitespace-padded JSON " null " unmarshaled to nil map, causing nil-pointer panic on field assignments ✅ Fixed — schemaObj == nil guard added, falls back to default object schema safely

🏆 Best part: Holy shit, this PR actually fixed the timeout loophole properly. I need to sit down. The ctxio wrapper is clean, well-tested, and exactly what should have existed from day one. I'm genuinely shocked — I had my sarcasm fully loaded and I don't even get to use it.

💀 Worst part: Nothing. For real. I've been looking for something to roast and the only viable target is the fact that the fix is obvious in retrospect, which means the original time-bomb was avoidable. But that's on the old code, not this one.

📊 Overall: Like finding out the guy who built a Death Star actually came back and installed a proper fire suppression system. Rare, welcome, and slightly suspicious that it took a near-miss to make it happen.

Files Reviewed (6 files)
  • internal/handlers/messages.go — previous issues fixed
  • internal/transformer/ctxio.go — new, clean
  • internal/transformer/ctxio_test.go — new, thorough
  • internal/transformer/request.go — previous issue fixed
  • internal/transformer/request_test.go — regression test added
  • internal/transformer/stream.go — parameter rename consistent

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Previous Review Summaries (3 snapshots, latest commit d6b4cf9)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit d6b4cf9)

Verdict: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 0
⚠️ warning 2
💡 suggestion 0
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 427 StreamingTimeoutMs only covers request startup; ProxyStream still reads on clientCtx, so a stalled mid-stream response can ignore the per-model timeout. Same gap applies to raw Anthropic io.Copy and Responses/Gemini streaming paths.
internal/transformer/request.go 605 Valid JSON like whitespace-padded null can leave schemaObj nil, then the properties assignment panics instead of falling back safely.

🏆 Best part: The cancellation handling and circuit-breaker accounting are genuinely solid; I hate admitting that because it makes the remaining timeout gap look even more avoidable.

💀 Worst part: A "per-model streaming timeout" that stops caring once headers arrive is the production incident version of a smoke alarm that only works before the fire starts.

📊 Overall: Strong stabilization PR with two sharp edges left: one timeout loophole and one nil-map panic waiting in the schema sanitizer.

Fix these issues in Kilo Cloud: https://app.kilo.ai/cloud-agent-fork/review/c6dadec2-4873-4951-8107-e9c03c9c34f6

Files Reviewed (16 files)
  • CONFIGURATION.md - no issues
  • README.md - no issues
  • configs/config.example.json - no issues
  • internal/client/opencode.go - no issues
  • internal/client/opencode_test.go - no issues
  • internal/config/atomic.go - no issues
  • internal/config/config.go - no issues
  • internal/handlers/messages.go - 1 issue
  • internal/handlers/messages_test.go - no issues
  • internal/router/fallback.go - no issues
  • internal/router/fallback_test.go - no issues
  • internal/server/server.go - no issues
  • internal/transformer/request.go - 1 issue
  • internal/transformer/request_test.go - no issues
  • internal/transformer/stream.go - no issues
  • internal/transformer/stream_test.go - no issues

Previous review

Verdict: 0 Issues Found | Recommendation: Merge

All previous issues have been addressed in this PR.

Fixes Applied

File Line Previous Issue Resolution
internal/handlers/messages.go 61 wroteHeader race with heartbeat goroutine Added mutex-protected HasWrittenHeader() method (line 70-74)
internal/handlers/messages.go 303 Heartbeat injection during raw Anthropic stream Heartbeat now paused via heartbeatPaused flag in handleAnthropicStreaming (line 580)
internal/transformer/request.go 588 Schema validation only checked JSON validity Added validation that type is "object" and properties is an object (lines 591-607)

🏆 Best part: The circuit breaker logic now correctly distinguishes between client-initiated cancellations (which don't poison the breaker) and real upstream failures. This is the kind of production hygiene that keeps incidents from snowballing.

💀 Worst part: None — the sharpest splinters have been sanded down. Even the SSE event index was rejiggered to start tool_use blocks at 0 as the spec demands.

📊 Overall: Like a master craftsman's final pass — the rough edges are gone and the whole thing gleams. The tests added here would make the previous reviewer blush with pride.

Files Reviewed (8 files)
  • internal/handlers/messages.go - 0 issues (3 fixed)
  • internal/handlers/messages_test.go - no issues (new tests)
  • internal/router/fallback.go - 0 issues (context cancellation added)
  • internal/router/fallback_test.go - no issues (new tests)
  • internal/server/server.go - no issues (WriteTimeout change)
  • internal/transformer/request.go - 0 issues (schema validation fixed)
  • internal/transformer/request_test.go - no issues (new tests)
  • internal/transformer/stream.go - no issues (index fix verified)
  • internal/transformer/stream_test.go - no issues (updated tests)

Previous review (commit 86bb0ff)

Verdict: 3 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 0
⚠️ warning 1
💡 suggestion 2
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
internal/handlers/messages.go 277 Heartbeat pause is scoped only around raw Anthropic copying, so future raw passthrough paths can trip the same injection hazard.
internal/handlers/messages.go 61 wroteHeader is written under a mutex but read later without locking, creating a data race with the heartbeat goroutine.
internal/transformer/request.go 588 Valid JSON schemas with the wrong shape can still pass through as OpenAI tool parameters and get rejected upstream.

🏆 Best part: The cancellation handling is genuinely tidy: fallback stops on parent cancellation and avoids poisoning circuit breakers with client disconnects, which is the kind of production hygiene I wish I saw more often.

💀 Worst part: The response writer race is the sharpest splinter here; you built a mutex moat, then left wroteHeader as an unlocked drawbridge.

📊 Overall: Strong production fix with a few concurrency and validation potholes — like a sports car with excellent brakes and one suspicious tire.

Fix these issues in Kilo Cloud: https://app.kilo.ai/cloud-agent-fork/review/0f34b0d6-52cb-4b57-9ae1-e392a3b697eb

Files Reviewed (14 files)
  • configs/config.example.json
  • internal/client/opencode.go
  • internal/client/opencode_test.go
  • internal/config/atomic.go
  • internal/config/config.go
  • internal/handlers/messages.go
  • internal/handlers/messages_test.go
  • internal/router/fallback.go
  • internal/router/fallback_test.go
  • internal/server/server.go
  • internal/transformer/request.go
  • internal/transformer/request_test.go
  • internal/transformer/stream.go
  • internal/transformer/stream_test.go

Reviewed by step-3.7-flash-20260528 · 513,444 tokens

…m flushing, heartbeat scoping

Fixed
- Data race on responseWriter.wroteHeader: add mutex lock in WriteHeader
  and a locked HasWrittenHeader() accessor for concurrent reads.
- Tool schema shape validation: verify "type" is "object" and "properties"
  is an object before forwarding to upstream OpenAI-compatible endpoints.
- Raw Anthropic SSE stream buffering: wrap io.Copy with a flushWriter that
  calls Flush after every write, preventing hung streams during heartbeat
  suppression.

Refactored
- Scoped heartbeat pause into handleAnthropicStreaming itself (Store(1) /
  defer Store(0)), removing duplicated Store calls from both call sites
  so future raw passthrough paths cannot forget to suppress heartbeats.

Co-Authored-By: kilocode-review[bot] <kilocode-review[bot]@users.noreply.github.com>
Co-Authored-By: ChatGPT-Codex-Connector[bot] <chatgpt-codex-connector[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hungcuong9125 hungcuong9125 marked this pull request as draft June 18, 2026 22:30
…flows

Claude Code review workflows can dispatch many subagents and tool calls while still using streaming requests.

When enable_streaming_scenario_routing is disabled, these requests may be routed to the fast scenario, even if they are tool-heavy. This can cause unreliable behavior with fast models during parallel Agent/tool orchestration.

Document that users should enable enable_streaming_scenario_routing for /review-code, /ultracode, multi-agent dispatch, large diffs, and long-context review sessions so streaming requests can route to complex or long_context models.
@hungcuong9125 hungcuong9125 marked this pull request as ready for review June 19, 2026 07:09
}

streamBody, err := h.client.GetStreamingBody(ctx, model.ModelID, openaiReq, model)
streamBody, err := h.client.GetStreamingBody(attemptCtx, model.ModelID, openaiReq, model)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: StreamingTimeoutMs is doing a great impression of a timeout until the response headers arrive; after GetStreamingBody returns, ProxyStream is still reading on clientCtx, so a mid-stream stall can sit there forever like a guest who ignores the last-call bell.

🩹 The Fix: Bind the stream-body read to the per-model attempt context, not just request startup. Pass attemptCtx (or a context-bound reader/wrapper) into ProxyStream, and apply the same treatment to the raw Anthropic io.Copy plus Responses/Gemini streaming paths.

📏 Severity: warning

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

Comment thread internal/transformer/request.go Outdated
…l schema

Two warnings from kilo-code-bot:

1. StreamingTimeoutMs only guards request startup; once GetStreamingBody
   returns, the body read was tied to the request context (no timeout),
   so a mid-stream stall could sit forever. Pass the per-model attempt
   context into ProxyStream/ProxyResponsesStream/ProxyGeminiStream and
   the raw Anthropic io.Copy, and wrap the upstream body with a tiny
   ctxio.NewCtxReadCloser so the body Read also respects the deadline.

2. transformTools panicked on valid JSON that unmarshals to a nil map
   (e.g. " null " with decorative whitespace). Treat that case the same
   as a successful parse of "{}" — fall back to the default schema.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hungcuong9125 hungcuong9125 marked this pull request as draft June 19, 2026 08:58
@samueltuyizere

Copy link
Copy Markdown
Collaborator

Thanks a lot for working on this @hungcuong9125 , let me know if you need help with these merge conflicts since I introduced most of them and I'd be happy to fix them.

@jremen

jremen commented Jun 19, 2026

Copy link
Copy Markdown

@hungcuong9125
Will this PR resolve this issue?
#62

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants