Skip to content

feat: track per-generation token usage in telemetry#325

Closed
suryaiyer95 wants to merge 2 commits intomainfrom
worktree-tokens-telemetry
Closed

feat: track per-generation token usage in telemetry#325
suryaiyer95 wants to merge 2 commits intomainfrom
worktree-tokens-telemetry

Conversation

@suryaiyer95
Copy link
Contributor

@suryaiyer95 suryaiyer95 commented Mar 20, 2026

Summary

  • Emits the generation telemetry event (previously defined but never fired) on every LLM step-finish
  • Tracks input, output, reasoning, cache-read, and cache-write tokens per generation in Azure App Insights as flat measurements
  • Includes model ID, provider ID, agent name, finish reason, cost, and step duration alongside the token breakdown
  • Optional token fields (tokens_reasoning, tokens_cache_read, tokens_cache_write) are only included when the provider actually returns them — never defaulted to 0
  • Updates telemetry.md with accurate description of the generation event fields

Design Decisions

Flat fields, not nested objects — Azure App Insights custom measurements must be top-level numbers. The generation event type now uses tokens_input, tokens_output, etc. directly instead of a nested tokens: TokensPayload object.

"Not available" vs "zero" — Each optional field uses the raw AI SDK usage values to determine availability:

  • tokens_reasoning: only when value.usage.reasoningTokens !== undefined (reasoning models)
  • tokens_cache_read: only when value.usage.cachedInputTokens !== undefined (cache hit)
  • tokens_cache_write: only when usage.tokens.cache.write > 0 (Anthropic/Bedrock metadata)

Step duration — Tracked via stepStartTime set at start-step, computed at finish-step.

Note on "1 input token" in traces

The existing tokens_input correctly reflects Anthropic's semantics: after the first step, all previous context is cached, so inputTokens from Anthropic is only the new tokens added since the last step. The large cached portion appears in tokens_cache_read. This is correct behavior — total = input + output + cache_read + cache_write.

Checklist

  • Tests pass (pre-existing failures unrelated to this change — @opencode-ai/util missing in worktree)
  • Upstream markers added in processor.ts
  • Docs updated: docs/docs/reference/telemetry.md

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Documentation

    • Telemetry docs updated: generation events now specify "AI model generation (step) completes" and list provider ID, agent, finish reason, cost, and detailed token breakdown; prompts are not collected.
  • Chores

    • Telemetry now emits per-step timing, cost, and flattened token metrics (input/output and optional reasoning/cache fields) for generation events.

…he/reasoning)

- Emit `generation` telemetry event on every LLM step-finish with model_id,
  provider_id, agent, finish_reason, cost, duration_ms, and token breakdown
- Token fields are flat (no nested objects) to comply with Azure App Insights
  custom measurements schema: `tokens_input`, `tokens_output`, and
  optionally `tokens_reasoning`, `tokens_cache_read`, `tokens_cache_write`
- Optional token fields are only included when the provider actually returns
  them — reasoning only for reasoning models, cache_read/write only when
  prompt caching is active — never defaulted to 0
- Step duration tracked from `start-step` to `finish-step` events
- Adds `altimate_change` markers in `processor.ts` (upstream file)
- Updates telemetry.md docs with accurate generation event description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 20, 2026 08:19
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review.

@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Generation telemetry was changed to emit flattened top-level token fields and added step timing telemetry. The session processor now emits a generation telemetry event at step completion with model/provider/agent, finish reason, cost, duration_ms, and conditional token fields (tokens_input, tokens_output, tokens_reasoning, tokens_cache_read, tokens_cache_write). Documentation updated accordingly.

Changes

Cohort / File(s) Summary
Documentation
docs/docs/reference/telemetry.md
Reworded generation event description to "AI model generation (step) completes", expanded metadata fields and detailed token breakdown including cache read/write; reiterated no prompt content is collected.
Telemetry Type Definition
packages/opencode/src/altimate/telemetry/index.ts
Removed nested tokens: TokensPayload and added flat numeric fields: tokens_input, tokens_output (required), and optional tokens_reasoning, tokens_cache_read, tokens_cache_write; eliminated TokensPayload type from this file.
Session Instrumentation
packages/opencode/src/session/processor.ts
Imported telemetry and added per-step timing; on finish-step emits Telemetry.track(...) with generation metadata (session/message ids, model, provider, agent, finish_reason, cost, duration_ms) and flattened token fields conditionally.
Tests
packages/opencode/test/session/processor.test.ts, packages/opencode/test/telemetry/telemetry.test.ts
Updated tests to expect flattened token fields (tokens_input, tokens_output, tokens_reasoning, tokens_cache_read, tokens_cache_write) instead of nested tokens object; adjusted assertions accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant SessionProcessor as SessionProcessor
    participant Telemetry as TelemetryModule
    participant AppInsights as AppInsightsExporter

    SessionProcessor->>SessionProcessor: start-step (record stepStartTime)
    SessionProcessor->>SessionProcessor: run step / receive response (usage, tokens, cost, finish_reason)
    SessionProcessor->>Telemetry: track(generation event with sessionId, messageId, model, provider, agent, finish_reason, cost, duration_ms, tokens_input, tokens_output, tokens_reasoning?, tokens_cache_read?, tokens_cache_write?)
    Telemetry->>AppInsights: toAppInsightsEnvelopes(flat measurements & properties)
    AppInsights-->>Telemetry: accept/enqueue
    Telemetry-->>SessionProcessor: ack (telemetry emitted)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • mdesmet

Poem

🐇 I hopped through logs with a twitchy nose,
Flattened tokens where the data flows.
Timing my steps and counting each bite,
Cache reads and writes tucked in plain sight.
A little rabbit, telemetry in tow—hoppity, ho!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: track per-generation token usage in telemetry' clearly and concisely summarizes the main change: adding telemetry tracking for token usage metrics during LLM generation steps.
Description check ✅ Passed The PR description thoroughly covers all required template sections: Summary explains what changed and why, Test Plan confirms tests pass, and Checklist marks completed items including tests, documentation, and design decisions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-tokens-telemetry
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opencode/src/altimate/telemetry/index.ts (1)

17-23: Remove the unused TokensPayload type.

The TokensPayload type (lines 17-23) is no longer referenced anywhere in the codebase after the schema was changed to use flat tokens_* fields. Removing it eliminates dead code and prevents confusion about the expected payload structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/telemetry/index.ts` around lines 17 - 23,
Remove the unused TokensPayload type declaration (export type TokensPayload)
from the altimate telemetry module; delete the block defining
input/output/reasoning/cache_read/cache_write and any exports of that symbol,
then run TypeScript type-check to confirm no remaining references and
update/remove any imports that referenced TokensPayload elsewhere (if found) so
the code compiles with the schema using flat tokens_* fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/opencode/src/altimate/telemetry/index.ts`:
- Around line 17-23: Remove the unused TokensPayload type declaration (export
type TokensPayload) from the altimate telemetry module; delete the block
defining input/output/reasoning/cache_read/cache_write and any exports of that
symbol, then run TypeScript type-check to confirm no remaining references and
update/remove any imports that referenced TokensPayload elsewhere (if found) so
the code compiles with the schema using flat tokens_* fields.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a9acd731-145c-4d0c-b8a5-039dc6c5ae3d

📥 Commits

Reviewing files that changed from the base of the PR and between df24e73 and 2038e2f.

📒 Files selected for processing (3)
  • docs/docs/reference/telemetry.md
  • packages/opencode/src/altimate/telemetry/index.ts
  • packages/opencode/src/session/processor.ts

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-generation (generation) telemetry emission on each LLM step completion, including a detailed token breakdown and step duration, and updates the public telemetry reference docs accordingly.

Changes:

  • Emit the generation telemetry event on every finish-step, including cost, finish reason, duration, and token measurements.
  • Update the Telemetry.Event schema to use flat tokens_* numeric fields (instead of a nested tokens object) to satisfy Azure App Insights measurement constraints.
  • Update docs/docs/reference/telemetry.md to describe the generation event fields accurately.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
packages/opencode/src/session/processor.ts Emits generation telemetry on finish-step and tracks duration_ms via a step start timestamp.
packages/opencode/src/altimate/telemetry/index.ts Updates the generation event type to use flat tokens_* fields.
docs/docs/reference/telemetry.md Updates documentation for the generation event’s fields/token breakdown.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +55 to +61
// Flat token fields — only present when data is available from the provider.
// No nested objects: Azure App Insights custom measures must be top-level numbers.
tokens_input: number
tokens_output: number
tokens_reasoning?: number // only for reasoning models
tokens_cache_read?: number // only when a cached prompt was reused
tokens_cache_write?: number // only when a new cache entry was written
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the generation telemetry event from a nested tokens object to flat tokens_* fields will break existing type-checked tests that still construct Telemetry.Event with tokens: { ... } (e.g. packages/opencode/test/session/processor.test.ts and packages/opencode/test/telemetry/telemetry.test.ts). Please update those tests to use tokens_input, tokens_output, etc., and adjust the envelope-flattening assertions accordingly (since tokens will no longer be present).

Suggested change
// Flat token fields — only present when data is available from the provider.
// No nested objects: Azure App Insights custom measures must be top-level numbers.
tokens_input: number
tokens_output: number
tokens_reasoning?: number // only for reasoning models
tokens_cache_read?: number // only when a cached prompt was reused
tokens_cache_write?: number // only when a new cache entry was written
tokens: TokensPayload

Copilot uses AI. Check for mistakes.
… fields

- Remove `TokensPayload` export (dead code since generation event now uses flat fields)
- Update processor.test.ts: construct generation event with flat tokens_* fields
- Update telemetry.test.ts: use flat tokens_* fields, rename test to reflect new shape

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/opencode/test/session/processor.test.ts (1)

461-485: Assert the rest of the flattened token fields in this case.

Right now this only checks tokens_input and tokens_cache_read, so tokens_output, tokens_reasoning, and tokens_cache_write can regress without failing this processor-mapping coverage.

➕ Suggested assertions
     expect(event.model_id).toBe("claude-opus-4-6")
     expect(event.tokens_input).toBe(1000)
+    expect(event.tokens_output).toBe(500)
+    expect(event.tokens_reasoning).toBe(200)
     expect(event.tokens_cache_read).toBe(800)
+    expect(event.tokens_cache_write).toBe(100)
     expect(event.cost).toBe(0.05)
     expect(event.finish_reason).toBe("end_turn")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/processor.test.ts` around lines 461 - 485,
Update the "generation event contains all required fields" test to assert the
remaining flattened token fields so regressions are caught: add expect checks
for event.tokens_output, event.tokens_reasoning, and event.tokens_cache_write
(and keep existing assertions) inside the same test block that defines the event
variable; reference the test name and the event object when adding these expect
statements.
packages/opencode/test/telemetry/telemetry.test.ts (1)

627-660: Add the omit-when-unavailable case for optional token metrics.

This only exercises the all-fields-present path. A companion case where the event omits tokens_reasoning, tokens_cache_read, and tokens_cache_write would protect the new “missing, not zero” contract.

➕ Suggested test
   test("flat token fields appear in measurements", async () => {
     const { fetchCalls, cleanup } = await initWithMockedFetch()
     try {
       Telemetry.track({
         type: "generation",
@@
     } finally {
       cleanup()
     }
   })
+
+  test("optional token metrics stay omitted when not present on the event", async () => {
+    const { fetchCalls, cleanup } = await initWithMockedFetch()
+    try {
+      Telemetry.track({
+        type: "generation",
+        timestamp: 1700000000000,
+        session_id: "sess-1",
+        message_id: "msg-1",
+        model_id: "claude-3",
+        provider_id: "anthropic",
+        agent: "builder",
+        finish_reason: "end_turn",
+        tokens_input: 100,
+        tokens_output: 200,
+        cost: 0.01,
+        duration_ms: 2000,
+      })
+
+      await Telemetry.flush()
+
+      const measurements = JSON.parse(fetchCalls[0].body)[0].data.baseData.measurements
+      expect(measurements.tokens_reasoning).toBeUndefined()
+      expect(measurements.tokens_cache_read).toBeUndefined()
+      expect(measurements.tokens_cache_write).toBeUndefined()
+    } finally {
+      cleanup()
+    }
+  })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/telemetry/telemetry.test.ts` around lines 627 - 660,
Add a companion test for Telemetry.track that omits the optional token fields
tokens_reasoning, tokens_cache_read, and tokens_cache_write to verify the
"omit-when-unavailable" behavior: call Telemetry.track with the same required
fields but leave out those three, call await Telemetry.flush(), parse
fetchCalls[0].body to get envelopes[0].data.baseData.measurements, and assert
that measurements.tokens_reasoning, measurements.tokens_cache_read, and
measurements.tokens_cache_write are not present (or are undefined) while
tokens_input/tokens_output still appear; use the same
initWithMockedFetch/cleanup pattern as the existing flat token fields test to
scope the mocked fetch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/opencode/test/session/processor.test.ts`:
- Around line 461-485: Update the "generation event contains all required
fields" test to assert the remaining flattened token fields so regressions are
caught: add expect checks for event.tokens_output, event.tokens_reasoning, and
event.tokens_cache_write (and keep existing assertions) inside the same test
block that defines the event variable; reference the test name and the event
object when adding these expect statements.

In `@packages/opencode/test/telemetry/telemetry.test.ts`:
- Around line 627-660: Add a companion test for Telemetry.track that omits the
optional token fields tokens_reasoning, tokens_cache_read, and
tokens_cache_write to verify the "omit-when-unavailable" behavior: call
Telemetry.track with the same required fields but leave out those three, call
await Telemetry.flush(), parse fetchCalls[0].body to get
envelopes[0].data.baseData.measurements, and assert that
measurements.tokens_reasoning, measurements.tokens_cache_read, and
measurements.tokens_cache_write are not present (or are undefined) while
tokens_input/tokens_output still appear; use the same
initWithMockedFetch/cleanup pattern as the existing flat token fields test to
scope the mocked fetch.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 9df61593-d829-4149-8b44-dcd10eef53d5

📥 Commits

Reviewing files that changed from the base of the PR and between 2038e2f and a389e18.

📒 Files selected for processing (3)
  • packages/opencode/src/altimate/telemetry/index.ts
  • packages/opencode/test/session/processor.test.ts
  • packages/opencode/test/telemetry/telemetry.test.ts

@dev-punia-altimate
Copy link

✅ Tests — All Passed

TypeScript — passed

cc @suryaiyer95
Tested at a389e189 | Run log | Powered by QA Autopilot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants