Problem
Claude Code v2.1.148 with interleaved-thinking-2025-05-14 beta shows empty responses when using kimi-k2.6. The proxy receives a valid response from upstream and streams it back, but Claude Code silently discards it.
kimi-k2.5 works fine (no thinking blocks). kimi-k2.6 returns reasoning_content in every chunk.
Root causes identified
1. Thinking blocks require signature_delta SSE event
Claude Code's extended thinking beta requires a content_block_delta event with type: "signature_delta" before content_block_stop for each thinking block. Without it, the entire response is discarded.
Anthropic streaming format for extended thinking:
content_block_start {type: "thinking"}
content_block_delta {type: "thinking_delta", thinking: "..."} ← repeated
content_block_delta {type: "signature_delta", signature: "..."} ← REQUIRED before stop
content_block_stop
The proxy currently emits content_block_stop immediately after thinking deltas, skipping the signature_delta step.
2. Two separate message_delta events instead of one
The proxy emits:
message_delta {stop_reason: "end_turn"} ← first (no usage)
message_delta {usage: {input_tokens: N, ...}} ← second (no stop_reason)
message_stop
Anthropic spec requires a single message_delta with both stop_reason and usage merged:
message_delta {stop_reason: "end_turn", usage: {input_tokens: N, ...}}
message_stop
3. kimi-k2.6 rejects thinking and reasoning_effort API parameters
kimi-k2.6 returns thinking blocks naturally (via reasoning_content in chunks) but does not accept the OpenAI-style thinking: {type: "enabled"} or reasoning_effort parameters in the request body — it returns HTTP 400.
The proxy's HasThinkingBlocks() check fires on the second turn (because the first response contained thinking), and the proxy then sends these parameters → 400 → circuit breaker opens → all subsequent requests fail.
Fix: only send thinking/reasoning_effort API params for DeepSeek models or models with explicit config, not for kimi.
Minimal reproduction
# First turn works, second turn fails (thinking history in messages)
curl -X POST http://127.0.0.1:3456/v1/messages \
-H 'anthropic-beta: interleaved-thinking-2025-05-14' \
-d '{
"model": "claude-opus-4-5-20251101",
"max_tokens": 200,
"stream": true,
"thinking": {"type": "enabled", "budget_tokens": 5000},
"messages": [
{"role": "user", "content": "what is 2+2?"},
{"role": "assistant", "content": [
{"type": "thinking", "thinking": "Let me compute", "signature": "abc"},
{"type": "text", "text": "4"}
]},
{"role": "user", "content": "and 3+3?"}
]
}'
# → HTTP 400 from kimi-k2.6
Suggested fixes
stream.go:
- Before each
content_block_stop for a thinking block, emit a signature_delta event with a non-empty placeholder signature
- Merge
stop_reason and usage into a single message_delta event
request.go:
- Add an
isKimiModel() or supportsThinkingAPI() check — only send thinking/reasoning_effort params for DeepSeek or explicitly configured models, not for kimi
Environment
- Claude Code v2.1.148
- oc-go-cc built from current master
- kimi-k2.6 via OpenCode Go (
accounts/fireworks/models/kimi-k2p6)
Problem
Claude Code v2.1.148 with
interleaved-thinking-2025-05-14beta shows empty responses when using kimi-k2.6. The proxy receives a valid response from upstream and streams it back, but Claude Code silently discards it.kimi-k2.5 works fine (no thinking blocks). kimi-k2.6 returns
reasoning_contentin every chunk.Root causes identified
1. Thinking blocks require
signature_deltaSSE eventClaude Code's extended thinking beta requires a
content_block_deltaevent withtype: "signature_delta"beforecontent_block_stopfor each thinking block. Without it, the entire response is discarded.Anthropic streaming format for extended thinking:
The proxy currently emits
content_block_stopimmediately after thinking deltas, skipping thesignature_deltastep.2. Two separate
message_deltaevents instead of oneThe proxy emits:
Anthropic spec requires a single
message_deltawith bothstop_reasonandusagemerged:3. kimi-k2.6 rejects
thinkingandreasoning_effortAPI parameterskimi-k2.6 returns thinking blocks naturally (via
reasoning_contentin chunks) but does not accept the OpenAI-stylethinking: {type: "enabled"}orreasoning_effortparameters in the request body — it returns HTTP 400.The proxy's
HasThinkingBlocks()check fires on the second turn (because the first response contained thinking), and the proxy then sends these parameters → 400 → circuit breaker opens → all subsequent requests fail.Fix: only send
thinking/reasoning_effortAPI params for DeepSeek models or models with explicit config, not for kimi.Minimal reproduction
Suggested fixes
stream.go:
content_block_stopfor a thinking block, emit asignature_deltaevent with a non-empty placeholder signaturestop_reasonandusageinto a singlemessage_deltaeventrequest.go:
isKimiModel()orsupportsThinkingAPI()check — only sendthinking/reasoning_effortparams for DeepSeek or explicitly configured models, not for kimiEnvironment
accounts/fireworks/models/kimi-k2p6)