fix: preserve reasoning_tokens and cached_tokens in usage conversion#174
fix: preserve reasoning_tokens and cached_tokens in usage conversion#174
Conversation
The strict mode adapter was hardcoding `reasoning_tokens: 0` and `cached_tokens: 0` in every chat completions → Responses API usage conversion, silently dropping the reasoning token counts reported by thinking models. The tool-loop accumulator also discarded `completion_tokens_details` and `prompt_tokens_details` entirely when summing usage across iterations. Adds `chat_usage_to_response_usage()` helper that extracts the real values from the raw JSON detail fields, and uses it in all five call sites (non-streaming adapter, streaming state, accumulation, unhandled tools path, final response path). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes loss of token-usage detail when converting Chat Completions responses into Responses API responses in strict-mode, ensuring reasoning_tokens (thinking models) and cached_tokens (prompt caching) are preserved end-to-end.
Changes:
- Added a shared
chat_usage_to_response_usagehelper to extractcached_tokens/reasoning_tokensfrom chat usage detail JSON into typedResponseUsage. - Replaced hardcoded-zero usage conversions in adapter, streaming, and handler paths to use the shared helper.
- Updated tool-loop usage accumulation to preserve the latest
*_tokens_detailsJSON instead of dropping it entirely.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/strict/mod.rs |
Introduces shared chat→responses usage conversion helper that extracts cached_tokens and reasoning_tokens. |
src/strict/adapter.rs |
Uses shared helper when mapping ChatCompletionResponse.usage into ResponsesResponse.usage. |
src/strict/handlers.rs |
Uses shared helper for aggregate usage; preserves *_tokens_details JSON during accumulation. |
src/strict/streaming.rs |
Uses shared helper to capture usage from final streaming chunk. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Clamp `cached_tokens` / `reasoning_tokens` to `u32::MAX` when parsing from untrusted JSON so oversized values don't silently wrap. - Sum `prompt_tokens_details` / `completion_tokens_details` across tool-loop iterations via new `merge_usage_details()` helper, so the aggregate detail counts stay consistent with the summed totals instead of keeping only one iteration's values. - Add unit tests for `chat_usage_to_response_usage` (extraction, missing details, u32 clamping) and `merge_usage_details` (summing, key union, None handling). - Add adapter test verifying `to_responses_response` extracts cached/reasoning tokens from the chat response's detail JSON. - Add streaming test verifying a chunk with populated usage details flows through `StreamingState` to `ResponseUsage`. - Extend `test_token_accumulation_across_iterations` to exercise the new summing behavior for detail counts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…g-tokens-in-usage # Conflicts: # src/strict/handlers.rs
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
## 🤖 New release * `onwards`: 0.24.1 -> 0.24.2 (✓ API compatible changes) <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [0.24.2](v0.24.1...v0.24.2) - 2026-04-15 ### Fixed - preserve reasoning_tokens and cached_tokens in usage conversion ([#174](#174)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Problem
When the Responses API adapter converts a Chat Completions response to a Responses API response, it constructs a
ResponseUsagefrom the upstreamUsage. The previous code hardcoded both detail fields to zero:This silently dropped the
reasoning_tokenscount reported by thinking models (DeepSeek-R1, Qwen3, etc.) and thecached_tokenscount reported by providers that support prompt caching, even when the upstream had populatedcompletion_tokens_details.reasoning_tokensandprompt_tokens_details.cached_tokenscorrectly.The bug was duplicated across five call sites:
adapter.rs— non-streamingto_responses_response_with_usagedefault usagestreaming.rs—StreamingStateusage capturehandlers.rs× 3 — usage accumulation across tool-loop iterations, the unhandled-tools response path, and the final response pathThe accumulation site was additionally dropping the entire
completion_tokens_detailsandprompt_tokens_detailsJSON blobs (= None) when summing across iterations, so even the chat-completions passthrough for tool-loop responses lost the detail.Fix
Adds a single shared helper in
src/strict/mod.rs:It extracts
reasoning_tokensfromcompletion_tokens_details.reasoning_tokensandcached_tokensfromprompt_tokens_details.cached_tokens(both stored as rawserde_json::Valueon the chat completions side), and threads them through into the typed Responses API fields. All five call sites now use this helper.The accumulation site in
handlers.rsnow preserves the latest detail JSON across iterations using.or()instead of dropping it.Test plan
cargo build— clean, no warningscargo test— all 317 tests passcargo clippy— no new warnings introduced🤖 Generated with Claude Code