Issue: Gemini Thought Summaries Degrade into Raw Reasoning Over the Zoo LiteLLM Provider
Terminology
- LiteLLM Proxy -- the external OpenAI-compatible server. Stateless per request.
- Zoo LiteLLM provider -- Zoo's client handler
Zoo-Code/src/api/providers/lite-llm.ts, the code we
own and change here.
Symptom
When a Gemini thinking model is used through the Zoo LiteLLM provider (talking to a LiteLLM Proxy
over /chat/completions), the first several turns render correctly: short thought summaries
appear in the collapsible "Thinking" block and the answer renders as normal text. After a few turns, the summaries disappear and Gemini emits long blocks of raw, verbose reasoning directly into the chat window's message output.
Root Cause
Gemini returns an encrypted continuity handle for each turn's thoughts on the Chat Completions
stream:
Gemini's contract: reasoning_content carries only a summary, and the model keeps emitting
summaries on later turns only if the prior turns' thought_signature is sent back in the
request history. When the signature is missing or replaced with a placeholder, Gemini can no
longer continue its thought chain and reverts to regenerating full raw reasoning each turn. The
effect compounds over a conversation, matching the "summaries disappear after a while" report.
The Zoo LiteLLM provider:
- Discards the real signatures. The streaming loop in
lite-llm.ts reads reasoning_content,
content, and tool_calls, but never reads
delta.provider_specific_fields.thought_signatures.
- Injects only a dummy bypass on tool calls.
injectThoughtSignatureForGemini() stamps
base64("skip_thought_signature_validator") onto assistant tool calls. This passes Gemini's
validation (so tool calling does not 400) but provides no real thought continuity, and pure
reasoning turns get no signature handling at all.
Why the Client Must Fix This
The LiteLLM Proxy's /chat/completions API is stateless -- the client owns and resends the full
message history each request, and the proxy retains nothing between calls. Continuity is therefore
only possible if the Zoo LiteLLM provider captures each turn's thought_signatures from the
response, persists them on the assistant message, and echoes them back in the next request.
This works entirely on the existing Chat Completions transport (no Responses API needed):
- The signature is already present on the Chat Completions delta (verified live).
- The LiteLLM Proxy accepts a client-supplied
provider_specific_fields.thought_signatures on an assistant message without error (verified
live), and translates it to the upstream thoughtSignature part.
Existing Pipeline to Reuse
The native Gemini provider already implements the full round-trip; the fix makes the Zoo LiteLLM
provider participate in the same generic pipeline:
- Capture during streaming into an instance field
(gemini.ts uses lastThoughtSignature).
- Expose via
getThoughtSignature()
(gemini.ts).
- Persist as a
{ type: "thoughtSignature", thoughtSignature } block -- already handled
generically for non-Anthropic protocols by
apiConversationHistory.ts.
- Replay the real signature on the next request (for LiteLLM, into
provider_specific_fields.thought_signature instead of the dummy).
Evidence
Captured from live LiteLLM Proxy running gemini 3.5 flash, reasoning_effort: high:
- Delta keys observed:
['content', 'provider_specific_fields', 'reasoning_content', 'role'].
reasoning_content and content never co-occur in a single chunk.
provider_specific_fields.thought_signatures is present with base64 signatures.
- An assistant message echoing
provider_specific_fields.thought_signatures is accepted without
error.
Scope
- In scope: Gemini models via the Zoo LiteLLM provider on the existing
/chat/completions
transport.
- Out of scope: OpenAI/xAI
reasoning.encrypted_content continuity over LiteLLM (would
require a Responses-API transport change) and OpenRouter non-tool reasoning continuity. Noted for future consideration.
Related GitHub Issues
No matching issue exists on Zoo-Code-Org/Zoo-Code. Closest upstream RooCodeInc/Roo-Code
siblings (rebranded repo) for context - however these are not related to the LiteLLM Provider which has this specific gap.
| # |
State |
Title |
| #12128 |
open |
xAI Responses API path replays persisted thinking blocks as plain assistant text |
| #11629 |
open |
gemini-3.1-pro via OpenRouter: assistant content[] stripped on Turn 2 |
| #11716 |
closed |
Vertex AI Gemini "Thought signature is not valid" error |
| #9499 |
closed |
gemini 3-pro tool calls missing thought_signature |
Issue: Gemini Thought Summaries Degrade into Raw Reasoning Over the Zoo LiteLLM Provider
Terminology
Zoo-Code/src/api/providers/lite-llm.ts, the code weown and change here.
Symptom
When a Gemini thinking model is used through the Zoo LiteLLM provider (talking to a LiteLLM Proxy
over
/chat/completions), the first several turns render correctly: short thought summariesappear in the collapsible "Thinking" block and the answer renders as normal text. After a few turns, the summaries disappear and Gemini emits long blocks of raw, verbose reasoning directly into the chat window's message output.
Root Cause
Gemini returns an encrypted continuity handle for each turn's thoughts on the Chat Completions
stream:
Gemini's contract:
reasoning_contentcarries only a summary, and the model keeps emittingsummaries on later turns only if the prior turns'
thought_signatureis sent back in therequest history. When the signature is missing or replaced with a placeholder, Gemini can no
longer continue its thought chain and reverts to regenerating full raw reasoning each turn. The
effect compounds over a conversation, matching the "summaries disappear after a while" report.
The Zoo LiteLLM provider:
lite-llm.tsreadsreasoning_content,content, andtool_calls, but never readsdelta.provider_specific_fields.thought_signatures.injectThoughtSignatureForGemini()stampsbase64("skip_thought_signature_validator")onto assistant tool calls. This passes Gemini'svalidation (so tool calling does not 400) but provides no real thought continuity, and pure
reasoning turns get no signature handling at all.
Why the Client Must Fix This
The LiteLLM Proxy's
/chat/completionsAPI is stateless -- the client owns and resends the fullmessage history each request, and the proxy retains nothing between calls. Continuity is therefore
only possible if the Zoo LiteLLM provider captures each turn's
thought_signaturesfrom theresponse, persists them on the assistant message, and echoes them back in the next request.
This works entirely on the existing Chat Completions transport (no Responses API needed):
provider_specific_fields.thought_signatureson an assistant message without error (verifiedlive), and translates it to the upstream
thoughtSignaturepart.Existing Pipeline to Reuse
The native Gemini provider already implements the full round-trip; the fix makes the Zoo LiteLLM
provider participate in the same generic pipeline:
(
gemini.tsuseslastThoughtSignature).getThoughtSignature()(
gemini.ts).{ type: "thoughtSignature", thoughtSignature }block -- already handledgenerically for non-Anthropic protocols by
apiConversationHistory.ts.provider_specific_fields.thought_signatureinstead of the dummy).Evidence
Captured from live LiteLLM Proxy running gemini 3.5 flash,
reasoning_effort: high:['content', 'provider_specific_fields', 'reasoning_content', 'role'].reasoning_contentandcontentnever co-occur in a single chunk.provider_specific_fields.thought_signaturesis present with base64 signatures.provider_specific_fields.thought_signaturesis accepted withouterror.
Scope
/chat/completionstransport.
reasoning.encrypted_contentcontinuity over LiteLLM (wouldrequire a Responses-API transport change) and OpenRouter non-tool reasoning continuity. Noted for future consideration.
Related GitHub Issues
No matching issue exists on
Zoo-Code-Org/Zoo-Code. Closest upstreamRooCodeInc/Roo-Codesiblings (rebranded repo) for context - however these are not related to the LiteLLM Provider which has this specific gap.