Skip to content

Expose cached token counts in ResponseMetaInfo #1275

@ramapitecusment

Description

@ramapitecusment

Feature Request: Expose cached token counts in ResponseMetaInfo

Problem

ResponseMetaInfo does not include cached input token counts, even though Koog already parses this data from provider responses.

I'm building a chat application that tracks costs per conversation. OpenAI returns prompt_tokens_details.cached_tokens in responses, and Koog correctly deserializes this into PromptTokensDetails.cachedTokens. However, this data is dropped in createMetaInfo() and never reaches my application code.

Current Behavior

AbstractOpenAILLMClient.kt:482-487

protected fun createMetaInfo(usage: OpenAIUsage?): ResponseMetaInfo = ResponseMetaInfo.create(
    clock,
    totalTokensCount = usage?.totalTokens,
    inputTokensCount = usage?.promptTokens,
    outputTokensCount = usage?.completionTokens
    // promptTokensDetails.cachedTokens is not passed
)

The metadata parameter exists in ResponseMetaInfo.create() but is never used.

Data That Gets Lost

OpenAIDataModels.kt:916-920 - Parsed but not exposed:

@Serializable
public class PromptTokensDetails(
    public val audioTokens: Int? = null,
    public val cachedTokens: Int? = null,  // This field exists
)

OpenAIDataModels.kt:901-907 - Also parsed but not exposed:

@Serializable
public class CompletionTokensDetails(
    public val acceptedPredictionTokens: Int? = null,
    public val audioTokens: Int? = null,
    public val reasoningTokens: Int? = null,
    public val rejectedPredictionTokens: Int? = null,
)

Proposed Solution

Populate the existing metadata: JsonObject? field in ResponseMetaInfo:

protected fun createMetaInfo(usage: OpenAIUsage?): ResponseMetaInfo {
    val tokenDetails = usage?.let {
        buildJsonObject {
            it.promptTokensDetails?.cachedTokens?.let { put("cachedInputTokens", it) }
            it.completionTokensDetails?.reasoningTokens?.let { put("reasoningTokens", it) }
        }.takeIf { it.isNotEmpty() }
    }

    return ResponseMetaInfo.create(
        clock,
        totalTokensCount = usage?.totalTokens,
        inputTokensCount = usage?.promptTokens,
        outputTokensCount = usage?.completionTokens,
        metadata = tokenDetails
    )
}

This is non-breaking since metadata already exists and accepts JsonObject?.

Why This Matters

Cached tokens cost 50-90% less than regular input tokens depending on provider:

  • OpenAI: 50% discount
  • Anthropic: 90% discount
  • Gemini: 75% discount

Without this data, applications either overcharge users or absorb the difference. For a typical chat session with 5K cached tokens and 1K new tokens using Claude Sonnet, the cost difference is ~$0.013 per request.

Files to Modify

  1. AbstractOpenAILLMClient.kt - Update createMetaInfo()
  2. Similar changes needed in Anthropic and Google clients for their cache fields

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions