Skip to content

feat(observability): log + persist SDK token usage on executions#209

Merged
chrisleekr merged 2 commits into
mainfrom
fix/issue-192
Jun 3, 2026
Merged

feat(observability): log + persist SDK token usage on executions#209
chrisleekr merged 2 commits into
mainfrom
fix/issue-192

Conversation

@chrisleekr

@chrisleekr chrisleekr commented Jun 3, 2026

Copy link
Copy Markdown
Owner

What

Closes #192. The Claude Agent SDK SDKResultMessage.usage carries four token counters plus a per-model modelUsage map, but the executor logged only the two cache counters and persisted none of them — only cost_usd / duration_ms / num_turns reached the executions table. Cost alone is opaque: a 500 KB-prompt / 2-turn run and a 5 KB-prompt / 50-turn run can bill the same costUsd. And the prompt-cache hit-ratio (cache_read / (input + cache_read + cache_creation)) — the load-bearing signal for cache stability (#134) — was uncomputable without the input_tokens denominator.

Changes (full log + persist thread)

  • src/types.tsModelUsageEntry + 5 optional token fields on ExecutionResult.
  • src/core/executor.ts — terminal log gains inputTokens / outputTokens / modelUsage (compact per-model array; compactModelUsage flattens the SDK Record<string, ModelUsage> and renames costUSDcostUsd); buildExecutionResult populates all five (exactOptionalPropertyTypes-safe).
  • src/core/pipeline.ts + src/core/log-fields.ts — the 4 scalar counters go on pipeline.completed, pinned by a new .strict() PipelineCompletedLogSchema with a co-located round-trip test.
  • src/shared/ws-messages.ts + src/daemon/job-executor.ts + src/orchestrator/connection-handler.ts + src/orchestrator/history.ts — thread the counters + modelUsage across the job:result WebSocket contract into markExecutionCompleted.
  • src/db/migrations/016_executions_tokens.sql — 4 nullable BIGINT + 1 nullable JSONB (model_usage) column. BIGINT (not INTEGER) because usage is cumulative and cache_read accumulates as cached-prompt-size × turns, which can exceed INTEGER's 2.1B on a long run.
  • docs/operate/observability.md — the new fields, a "Token usage and the cache hit-ratio" subsection, and the cache-read-share regression alert.

Red → green

PipelineCompletedLogSchema did not exist; the new round-trip test pins the token fields. The executor test now asserts the token fields + the costUSDcostUsd rename land on the returned ExecutionResult. The WS round-trip test exercises the new payload fields + a malformed-modelUsage rejection.

Acceptance criteria (issue #192)

  1. ✅ Executor log carries inputTokens / outputTokens / modelUsage.
  2. ✅ Counters thread ExecutionResultpipeline.completed → WS → markExecutionCompleted.
  3. pipeline.completed pinned by a strict schema + round-trip test.
  4. ✅ Migration 016 (4 BIGINT + 1 JSONB); markExecutionCompleted persists them.
  5. ✅ Docs: fields + cache hit-ratio + alert.
  6. ✅ Quality gates pass.

Notes

  • Migration is 016 (issue said 015, but 015_review_learnings_embedding.sql is taken). Additive nullable columns; pre-existing rows stay NULL; dispatch_stats aggregates unaffected.
  • migrate.test.ts migration-count assertions bumped 15→16 + the new token columns asserted.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Token usage metrics now captured and persisted, including input/output tokens and cache token counters.
    • Per-model token usage breakdown available for improved cost tracking.
    • Cache metrics and hit-ratio monitoring enabled with documented alerting guidance.
  • Documentation

    • Updated observability documentation with expanded pipeline log field mappings and token usage metrics guidance.

Surface the input/output/cache token counters and per-model modelUsage
from SDKResultMessage that the executor dropped: log them, thread the four
scalar counters through ExecutionResult -> pipeline.completed -> the WS
job:result contract -> markExecutionCompleted, and persist them to new
executions columns (migration 016, BIGINT + JSONB). Unblocks prompt-size
and prompt-cache hit-ratio observability.

Closes #192

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 3, 2026 07:35
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@chrisleekr, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 39 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b7e79d0f-f341-4526-8be9-9defb8b0b8c5

📥 Commits

Reviewing files that changed from the base of the PR and between 98efdd0 and da62461.

📒 Files selected for processing (3)
  • docs/operate/observability.md
  • src/core/executor.ts
  • src/shared/ws-messages.ts
📝 Walkthrough

Walkthrough

This PR adds end-to-end observability for Claude Agent SDK token usage, cache metrics, and per-model costs. Token and cache data are extracted at execution time, threaded through the pipeline and daemon, persisted to the database with a new schema migration, validated through logging schemas, and documented for operators.

Changes

Token Usage and Cache Metrics Observability

Layer / File(s) Summary
Token usage type contracts
src/types.ts
New ModelUsageEntry interface defines per-model token counts and cost; ExecutionResult extended with optional inputTokens, outputTokens, cacheReadInputTokens, cacheCreationInputTokens, and modelUsage array.
Executor token capture and compaction
src/core/executor.ts, test/core/executor.test.ts
New compactModelUsage() helper transforms SDK's per-model map into ModelUsageEntry[]; buildExecutionResult() copies usage fields and nested model costs into result; completion log includes token metrics; test helper and new test case verify input/output token threading and model usage mapping.
Result threading through pipeline and daemon
src/core/pipeline.ts, src/daemon/job-executor.ts
Token metrics from executeAgent result are added to pipeline.completed log payload; job:result daemon message conditionally includes token counts and modelUsage array for persistence.
Daemon-to-server message schema
src/shared/ws-messages.ts, test/shared/ws-messages.test.ts
job:result Zod schema updated to accept optional top-level token counters and modelUsage array with per-model token breakdown; tests verify schema accepts valid entries and rejects malformed token data.
Orchestrator persistence and database schema
src/db/migrations/016_executions_tokens.sql, src/orchestrator/connection-handler.ts, src/orchestrator/history.ts, test/db/migrate.test.ts
Migration adds nullable BIGINT columns (input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens) and model_usage JSONB to executions table; connection-handler and history extend result object to conditionally persist token fields; migration tests assert new columns present.
Core pipeline log schema validation
src/core/log-fields.ts, test/core/log-fields.test.ts
PipelineCompletedLogSchema Zod schema validates pipeline.completed terminal logs with required event/success/wall-clock and optional token/cost/duration fields; tests verify parsing accepts full and minimal logs and rejects invalid token values.
Operator documentation
docs/operate/observability.md
Core pipeline log section expanded with detailed event-key mappings, enumerated pipeline.stage names, token counter descriptions, cache hit-ratio formula (cache_read / (cache_read + cache_creation)), and alerting guidance based on cache-read share baseline.

Sequence Diagram(s)

sequenceDiagram
  participant SDK as Claude Agent SDK
  participant Executor
  participant Pipeline
  participant Daemon as Job Executor
  participant ConnectionHandler
  participant History as markExecutionCompleted
  participant Database
  SDK->>Executor: modelUsage map, input_tokens, output_tokens
  Executor->>Executor: compactModelUsage()
  Executor->>Pipeline: ExecutionResult with tokens
  Pipeline->>Daemon: token metrics in completion log
  Daemon->>ConnectionHandler: job:result with inputTokens, modelUsage
  ConnectionHandler->>History: result object with token fields
  History->>Database: UPDATE executions SET input_tokens, output_tokens, model_usage
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • #192: The changes directly implement token usage and cache metrics observability by adding ModelUsageEntry type, extracting metrics via compactModelUsage(), threading through pipeline/daemon, persisting with migration 016, validating with PipelineCompletedLogSchema, and documenting cache hit-ratio formulas and alerting guidance.

Suggested labels

type: feature ✨, type: docs 📋

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(observability): log + persist SDK token usage on executions' clearly and specifically summarizes the main change: capturing and persisting Claude Agent SDK token usage metrics throughout the execution pipeline.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves observability for Claude Agent SDK executions by logging and persisting full token-usage telemetry (including per-model breakdown) so operators can distinguish cost drivers and compute prompt-cache hit ratios over time.

Changes:

  • Extend execution result types to carry token counters and a per-model modelUsage array (with costUSDcostUsd normalization).
  • Add strict schema + tests for pipeline.completed log shape including token counters, and thread token fields through the daemon WS job:result contract into DB persistence.
  • Add migration 016_executions_tokens to persist four token counters (BIGINT) plus model_usage (JSONB) on executions, and update docs with cache hit-ratio guidance.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/shared/ws-messages.test.ts Extends WS schema tests to include token fields and rejects malformed modelUsage entries.
test/db/migrate.test.ts Updates migration count expectations and asserts new executions token columns exist.
test/core/log-fields.test.ts Adds round-trip tests for strict PipelineCompletedLogSchema including token counters.
test/core/executor.test.ts Asserts executor threads token fields + compacts/renames modelUsage correctly into ExecutionResult and logs.
src/types.ts Introduces ModelUsageEntry and adds optional token usage fields to ExecutionResult.
src/shared/ws-messages.ts Extends job:result schema to accept token counters and modelUsage array.
src/orchestrator/history.ts Persists token counters and model_usage JSONB into executions on completion.
src/orchestrator/connection-handler.ts Threads token fields + modelUsage from WS payload into markExecutionCompleted.
src/db/migrations/016_executions_tokens.sql Adds BIGINT token columns and JSONB model_usage to executions.
src/daemon/job-executor.ts Includes token counters + modelUsage in daemon job:result messages when present.
src/core/pipeline.ts Logs token counters on pipeline.completed event.
src/core/log-fields.ts Adds strict PipelineCompletedLogSchema including token counters.
src/core/executor.ts Logs inputTokens/outputTokens and compacts SDK modelUsage map into persisted array form.
docs/operate/observability.md Documents new fields and explains cache hit-ratio metric + alerting approach.

Comment thread src/core/executor.ts Outdated
Comment thread src/shared/ws-messages.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/operate/observability.md`:
- Around line 57-63: Mention that the cache-hit ratio formula
cache_read_input_tokens / (input_tokens + cache_read_input_tokens +
cache_creation_input_tokens) can produce NaN if the denominator is zero; update
the docs to instruct dashboard/alert queries to guard by checking the
denominator > 0 (or using a safe-coalesce/default value) before division and to
treat the metric as undefined/ignore the datapoint when all three counters are
zero; also update the example alert expression sum(cache_read_input_tokens) /
sum(input_tokens + cache_read_input_tokens + cache_creation_input_tokens) to
include the same zero-denominator guard.

In `@src/shared/ws-messages.ts`:
- Line 394: The `model` Zod schema currently allows empty strings; update the
`model: z.string()` declaration in src/shared/ws-messages.ts to validate
non-empty values by changing it to use `.min(1)` (e.g., `model:
z.string().min(1)`), optionally supplying a short error message; this adds
defensive validation at the schema boundary to reject empty model names.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a60f8844-c2dc-4a4a-a70d-cbb500fce99a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b0e8ef and 98efdd0.

📒 Files selected for processing (14)
  • docs/operate/observability.md
  • src/core/executor.ts
  • src/core/log-fields.ts
  • src/core/pipeline.ts
  • src/daemon/job-executor.ts
  • src/db/migrations/016_executions_tokens.sql
  • src/orchestrator/connection-handler.ts
  • src/orchestrator/history.ts
  • src/shared/ws-messages.ts
  • src/types.ts
  • test/core/executor.test.ts
  • test/core/log-fields.test.ts
  • test/db/migrate.test.ts
  • test/shared/ws-messages.test.ts

Comment thread docs/operate/observability.md Outdated
Comment thread src/shared/ws-messages.ts Outdated
Treat an empty SDK modelUsage map as undefined (omit rather than persist
[]), sort entries by model for deterministic order, require model.min(1)
on the WS schema, and note the zero-denominator cache-ratio edge case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(observability): log input/output tokens + modelUsage from SDKResultMessage; persist to executions table

2 participants