Skip to content

[BUG] Gemini Thought Summaries Degrade into Raw Reasoning Over the Zoo LiteLLM Provider #736

Description

@awschmeder

Issue: Gemini Thought Summaries Degrade into Raw Reasoning Over the Zoo LiteLLM Provider

Terminology

  • LiteLLM Proxy -- the external OpenAI-compatible server. Stateless per request.
  • Zoo LiteLLM provider -- Zoo's client handler
    Zoo-Code/src/api/providers/lite-llm.ts, the code we
    own and change here.

Symptom

When a Gemini thinking model is used through the Zoo LiteLLM provider (talking to a LiteLLM Proxy
over /chat/completions), the first several turns render correctly: short thought summaries
appear in the collapsible "Thinking" block and the answer renders as normal text. After a few turns, the summaries disappear and Gemini emits long blocks of raw, verbose reasoning directly into the chat window's message output.

Root Cause

Gemini returns an encrypted continuity handle for each turn's thoughts on the Chat Completions
stream:

"choices": [{ "delta": { "provider_specific_fields": { "thought_signatures": ["<base64>"] } } }]

Gemini's contract: reasoning_content carries only a summary, and the model keeps emitting
summaries on later turns only if the prior turns' thought_signature is sent back in the
request history. When the signature is missing or replaced with a placeholder, Gemini can no
longer continue its thought chain and reverts to regenerating full raw reasoning each turn. The
effect compounds over a conversation, matching the "summaries disappear after a while" report.

The Zoo LiteLLM provider:

  1. Discards the real signatures. The streaming loop in
    lite-llm.ts reads reasoning_content,
    content, and tool_calls, but never reads
    delta.provider_specific_fields.thought_signatures.
  2. Injects only a dummy bypass on tool calls.
    injectThoughtSignatureForGemini() stamps
    base64("skip_thought_signature_validator") onto assistant tool calls. This passes Gemini's
    validation (so tool calling does not 400) but provides no real thought continuity, and pure
    reasoning turns get no signature handling at all.

Why the Client Must Fix This

The LiteLLM Proxy's /chat/completions API is stateless -- the client owns and resends the full
message history each request, and the proxy retains nothing between calls. Continuity is therefore
only possible if the Zoo LiteLLM provider captures each turn's thought_signatures from the
response, persists them on the assistant message, and echoes them back in the next request.

This works entirely on the existing Chat Completions transport (no Responses API needed):

  • The signature is already present on the Chat Completions delta (verified live).
  • The LiteLLM Proxy accepts a client-supplied
    provider_specific_fields.thought_signatures on an assistant message without error (verified
    live), and translates it to the upstream thoughtSignature part.

Existing Pipeline to Reuse

The native Gemini provider already implements the full round-trip; the fix makes the Zoo LiteLLM
provider participate in the same generic pipeline:

  1. Capture during streaming into an instance field
    (gemini.ts uses lastThoughtSignature).
  2. Expose via getThoughtSignature()
    (gemini.ts).
  3. Persist as a { type: "thoughtSignature", thoughtSignature } block -- already handled
    generically for non-Anthropic protocols by
    apiConversationHistory.ts.
  4. Replay the real signature on the next request (for LiteLLM, into
    provider_specific_fields.thought_signature instead of the dummy).

Evidence

Captured from live LiteLLM Proxy running gemini 3.5 flash, reasoning_effort: high:

  • Delta keys observed: ['content', 'provider_specific_fields', 'reasoning_content', 'role'].
  • reasoning_content and content never co-occur in a single chunk.
  • provider_specific_fields.thought_signatures is present with base64 signatures.
  • An assistant message echoing provider_specific_fields.thought_signatures is accepted without
    error.

Scope

  • In scope: Gemini models via the Zoo LiteLLM provider on the existing /chat/completions
    transport.
  • Out of scope: OpenAI/xAI reasoning.encrypted_content continuity over LiteLLM (would
    require a Responses-API transport change) and OpenRouter non-tool reasoning continuity. Noted for future consideration.

Related GitHub Issues

No matching issue exists on Zoo-Code-Org/Zoo-Code. Closest upstream RooCodeInc/Roo-Code
siblings (rebranded repo) for context - however these are not related to the LiteLLM Provider which has this specific gap.

# State Title
#12128 open xAI Responses API path replays persisted thinking blocks as plain assistant text
#11629 open gemini-3.1-pro via OpenRouter: assistant content[] stripped on Turn 2
#11716 closed Vertex AI Gemini "Thought signature is not valid" error
#9499 closed gemini 3-pro tool calls missing thought_signature

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions