Skip to content

kimi-k2.6: empty response in Claude Code — thinking blocks require signature_delta + merged message_delta #51

@lt-value

Description

@lt-value

Problem

Claude Code v2.1.148 with interleaved-thinking-2025-05-14 beta shows empty responses when using kimi-k2.6. The proxy receives a valid response from upstream and streams it back, but Claude Code silently discards it.

kimi-k2.5 works fine (no thinking blocks). kimi-k2.6 returns reasoning_content in every chunk.

Root causes identified

1. Thinking blocks require signature_delta SSE event

Claude Code's extended thinking beta requires a content_block_delta event with type: "signature_delta" before content_block_stop for each thinking block. Without it, the entire response is discarded.

Anthropic streaming format for extended thinking:

content_block_start  {type: "thinking"}
content_block_delta  {type: "thinking_delta", thinking: "..."}   ← repeated
content_block_delta  {type: "signature_delta", signature: "..."}  ← REQUIRED before stop
content_block_stop

The proxy currently emits content_block_stop immediately after thinking deltas, skipping the signature_delta step.

2. Two separate message_delta events instead of one

The proxy emits:

message_delta  {stop_reason: "end_turn"}          ← first (no usage)
message_delta  {usage: {input_tokens: N, ...}}    ← second (no stop_reason)
message_stop

Anthropic spec requires a single message_delta with both stop_reason and usage merged:

message_delta  {stop_reason: "end_turn", usage: {input_tokens: N, ...}}
message_stop

3. kimi-k2.6 rejects thinking and reasoning_effort API parameters

kimi-k2.6 returns thinking blocks naturally (via reasoning_content in chunks) but does not accept the OpenAI-style thinking: {type: "enabled"} or reasoning_effort parameters in the request body — it returns HTTP 400.

The proxy's HasThinkingBlocks() check fires on the second turn (because the first response contained thinking), and the proxy then sends these parameters → 400 → circuit breaker opens → all subsequent requests fail.

Fix: only send thinking/reasoning_effort API params for DeepSeek models or models with explicit config, not for kimi.

Minimal reproduction

# First turn works, second turn fails (thinking history in messages)
curl -X POST http://127.0.0.1:3456/v1/messages \
  -H 'anthropic-beta: interleaved-thinking-2025-05-14' \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 200,
    "stream": true,
    "thinking": {"type": "enabled", "budget_tokens": 5000},
    "messages": [
      {"role": "user", "content": "what is 2+2?"},
      {"role": "assistant", "content": [
        {"type": "thinking", "thinking": "Let me compute", "signature": "abc"},
        {"type": "text", "text": "4"}
      ]},
      {"role": "user", "content": "and 3+3?"}
    ]
  }'
# → HTTP 400 from kimi-k2.6

Suggested fixes

stream.go:

  • Before each content_block_stop for a thinking block, emit a signature_delta event with a non-empty placeholder signature
  • Merge stop_reason and usage into a single message_delta event

request.go:

  • Add an isKimiModel() or supportsThinkingAPI() check — only send thinking/reasoning_effort params for DeepSeek or explicitly configured models, not for kimi

Environment

  • Claude Code v2.1.148
  • oc-go-cc built from current master
  • kimi-k2.6 via OpenCode Go (accounts/fireworks/models/kimi-k2p6)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions