Skip to content

fix: preserve reasoning_tokens and cached_tokens in usage conversion#174

Merged
pjb157 merged 3 commits intomainfrom
fix/preserve-reasoning-tokens-in-usage
Apr 15, 2026
Merged

fix: preserve reasoning_tokens and cached_tokens in usage conversion#174
pjb157 merged 3 commits intomainfrom
fix/preserve-reasoning-tokens-in-usage

Conversation

@pjb157
Copy link
Copy Markdown
Contributor

@pjb157 pjb157 commented Apr 10, 2026

Problem

When the Responses API adapter converts a Chat Completions response to a Responses API response, it constructs a ResponseUsage from the upstream Usage. The previous code hardcoded both detail fields to zero:

input_tokens_details: InputTokensDetails { cached_tokens: 0 },
output_tokens_details: OutputTokensDetails { reasoning_tokens: 0 },

This silently dropped the reasoning_tokens count reported by thinking models (DeepSeek-R1, Qwen3, etc.) and the cached_tokens count reported by providers that support prompt caching, even when the upstream had populated completion_tokens_details.reasoning_tokens and prompt_tokens_details.cached_tokens correctly.

The bug was duplicated across five call sites:

  • adapter.rs — non-streaming to_responses_response_with_usage default usage
  • streaming.rsStreamingState usage capture
  • handlers.rs × 3 — usage accumulation across tool-loop iterations, the unhandled-tools response path, and the final response path

The accumulation site was additionally dropping the entire completion_tokens_details and prompt_tokens_details JSON blobs (= None) when summing across iterations, so even the chat-completions passthrough for tool-loop responses lost the detail.

Fix

Adds a single shared helper in src/strict/mod.rs:

pub(crate) fn chat_usage_to_response_usage(
    u: &schemas::chat_completions::Usage,
) -> schemas::responses::ResponseUsage

It extracts reasoning_tokens from completion_tokens_details.reasoning_tokens and cached_tokens from prompt_tokens_details.cached_tokens (both stored as raw serde_json::Value on the chat completions side), and threads them through into the typed Responses API fields. All five call sites now use this helper.

The accumulation site in handlers.rs now preserves the latest detail JSON across iterations using .or() instead of dropping it.

Test plan

  • cargo build — clean, no warnings
  • cargo test — all 317 tests pass
  • cargo clippy — no new warnings introduced

🤖 Generated with Claude Code

The strict mode adapter was hardcoding `reasoning_tokens: 0` and
`cached_tokens: 0` in every chat completions → Responses API usage
conversion, silently dropping the reasoning token counts reported by
thinking models. The tool-loop accumulator also discarded
`completion_tokens_details` and `prompt_tokens_details` entirely when
summing usage across iterations.

Adds `chat_usage_to_response_usage()` helper that extracts the real
values from the raw JSON detail fields, and uses it in all five call
sites (non-streaming adapter, streaming state, accumulation, unhandled
tools path, final response path).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes loss of token-usage detail when converting Chat Completions responses into Responses API responses in strict-mode, ensuring reasoning_tokens (thinking models) and cached_tokens (prompt caching) are preserved end-to-end.

Changes:

  • Added a shared chat_usage_to_response_usage helper to extract cached_tokens / reasoning_tokens from chat usage detail JSON into typed ResponseUsage.
  • Replaced hardcoded-zero usage conversions in adapter, streaming, and handler paths to use the shared helper.
  • Updated tool-loop usage accumulation to preserve the latest *_tokens_details JSON instead of dropping it entirely.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/strict/mod.rs Introduces shared chat→responses usage conversion helper that extracts cached_tokens and reasoning_tokens.
src/strict/adapter.rs Uses shared helper when mapping ChatCompletionResponse.usage into ResponsesResponse.usage.
src/strict/handlers.rs Uses shared helper for aggregate usage; preserves *_tokens_details JSON during accumulation.
src/strict/streaming.rs Uses shared helper to capture usage from final streaming chunk.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/strict/mod.rs Outdated
Comment thread src/strict/handlers.rs
Comment thread src/strict/adapter.rs
Comment thread src/strict/streaming.rs
- Clamp `cached_tokens` / `reasoning_tokens` to `u32::MAX` when parsing
  from untrusted JSON so oversized values don't silently wrap.
- Sum `prompt_tokens_details` / `completion_tokens_details` across
  tool-loop iterations via new `merge_usage_details()` helper, so the
  aggregate detail counts stay consistent with the summed totals
  instead of keeping only one iteration's values.
- Add unit tests for `chat_usage_to_response_usage` (extraction,
  missing details, u32 clamping) and `merge_usage_details` (summing,
  key union, None handling).
- Add adapter test verifying `to_responses_response` extracts
  cached/reasoning tokens from the chat response's detail JSON.
- Add streaming test verifying a chunk with populated usage details
  flows through `StreamingState` to `ResponseUsage`.
- Extend `test_token_accumulation_across_iterations` to exercise the
  new summing behavior for detail counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/strict/mod.rs
…g-tokens-in-usage

# Conflicts:
#	src/strict/handlers.rs
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pjb157 pjb157 merged commit 7437aea into main Apr 15, 2026
9 checks passed
@github-actions github-actions bot mentioned this pull request Apr 15, 2026
pjb157 pushed a commit that referenced this pull request Apr 15, 2026
## 🤖 New release

* `onwards`: 0.24.1 -> 0.24.2 (✓ API compatible changes)

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

##
[0.24.2](v0.24.1...v0.24.2)
- 2026-04-15

### Fixed

- preserve reasoning_tokens and cached_tokens in usage conversion
([#174](#174))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants