feat(observability): log + persist SDK token usage on executions#209
Conversation
Surface the input/output/cache token counters and per-model modelUsage from SDKResultMessage that the executor dropped: log them, thread the four scalar counters through ExecutionResult -> pipeline.completed -> the WS job:result contract -> markExecutionCompleted, and persist them to new executions columns (migration 016, BIGINT + JSONB). Unblocks prompt-size and prompt-cache hit-ratio observability. Closes #192 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 32 minutes and 39 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds end-to-end observability for Claude Agent SDK token usage, cache metrics, and per-model costs. Token and cache data are extracted at execution time, threaded through the pipeline and daemon, persisted to the database with a new schema migration, validated through logging schemas, and documented for operators. ChangesToken Usage and Cache Metrics Observability
Sequence Diagram(s)sequenceDiagram
participant SDK as Claude Agent SDK
participant Executor
participant Pipeline
participant Daemon as Job Executor
participant ConnectionHandler
participant History as markExecutionCompleted
participant Database
SDK->>Executor: modelUsage map, input_tokens, output_tokens
Executor->>Executor: compactModelUsage()
Executor->>Pipeline: ExecutionResult with tokens
Pipeline->>Daemon: token metrics in completion log
Daemon->>ConnectionHandler: job:result with inputTokens, modelUsage
ConnectionHandler->>History: result object with token fields
History->>Database: UPDATE executions SET input_tokens, output_tokens, model_usage
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR improves observability for Claude Agent SDK executions by logging and persisting full token-usage telemetry (including per-model breakdown) so operators can distinguish cost drivers and compute prompt-cache hit ratios over time.
Changes:
- Extend execution result types to carry token counters and a per-model
modelUsagearray (withcostUSD→costUsdnormalization). - Add strict schema + tests for
pipeline.completedlog shape including token counters, and thread token fields through the daemon WSjob:resultcontract into DB persistence. - Add migration
016_executions_tokensto persist four token counters (BIGINT) plusmodel_usage(JSONB) onexecutions, and update docs with cache hit-ratio guidance.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/shared/ws-messages.test.ts | Extends WS schema tests to include token fields and rejects malformed modelUsage entries. |
| test/db/migrate.test.ts | Updates migration count expectations and asserts new executions token columns exist. |
| test/core/log-fields.test.ts | Adds round-trip tests for strict PipelineCompletedLogSchema including token counters. |
| test/core/executor.test.ts | Asserts executor threads token fields + compacts/renames modelUsage correctly into ExecutionResult and logs. |
| src/types.ts | Introduces ModelUsageEntry and adds optional token usage fields to ExecutionResult. |
| src/shared/ws-messages.ts | Extends job:result schema to accept token counters and modelUsage array. |
| src/orchestrator/history.ts | Persists token counters and model_usage JSONB into executions on completion. |
| src/orchestrator/connection-handler.ts | Threads token fields + modelUsage from WS payload into markExecutionCompleted. |
| src/db/migrations/016_executions_tokens.sql | Adds BIGINT token columns and JSONB model_usage to executions. |
| src/daemon/job-executor.ts | Includes token counters + modelUsage in daemon job:result messages when present. |
| src/core/pipeline.ts | Logs token counters on pipeline.completed event. |
| src/core/log-fields.ts | Adds strict PipelineCompletedLogSchema including token counters. |
| src/core/executor.ts | Logs inputTokens/outputTokens and compacts SDK modelUsage map into persisted array form. |
| docs/operate/observability.md | Documents new fields and explains cache hit-ratio metric + alerting approach. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/operate/observability.md`:
- Around line 57-63: Mention that the cache-hit ratio formula
cache_read_input_tokens / (input_tokens + cache_read_input_tokens +
cache_creation_input_tokens) can produce NaN if the denominator is zero; update
the docs to instruct dashboard/alert queries to guard by checking the
denominator > 0 (or using a safe-coalesce/default value) before division and to
treat the metric as undefined/ignore the datapoint when all three counters are
zero; also update the example alert expression sum(cache_read_input_tokens) /
sum(input_tokens + cache_read_input_tokens + cache_creation_input_tokens) to
include the same zero-denominator guard.
In `@src/shared/ws-messages.ts`:
- Line 394: The `model` Zod schema currently allows empty strings; update the
`model: z.string()` declaration in src/shared/ws-messages.ts to validate
non-empty values by changing it to use `.min(1)` (e.g., `model:
z.string().min(1)`), optionally supplying a short error message; this adds
defensive validation at the schema boundary to reject empty model names.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a60f8844-c2dc-4a4a-a70d-cbb500fce99a
📒 Files selected for processing (14)
docs/operate/observability.mdsrc/core/executor.tssrc/core/log-fields.tssrc/core/pipeline.tssrc/daemon/job-executor.tssrc/db/migrations/016_executions_tokens.sqlsrc/orchestrator/connection-handler.tssrc/orchestrator/history.tssrc/shared/ws-messages.tssrc/types.tstest/core/executor.test.tstest/core/log-fields.test.tstest/db/migrate.test.tstest/shared/ws-messages.test.ts
Treat an empty SDK modelUsage map as undefined (omit rather than persist []), sort entries by model for deterministic order, require model.min(1) on the WS schema, and note the zero-denominator cache-ratio edge case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What
Closes #192. The Claude Agent SDK
SDKResultMessage.usagecarries four token counters plus a per-modelmodelUsagemap, but the executor logged only the two cache counters and persisted none of them — onlycost_usd/duration_ms/num_turnsreached theexecutionstable. Cost alone is opaque: a 500 KB-prompt / 2-turn run and a 5 KB-prompt / 50-turn run can bill the samecostUsd. And the prompt-cache hit-ratio (cache_read / (input + cache_read + cache_creation)) — the load-bearing signal for cache stability (#134) — was uncomputable without theinput_tokensdenominator.Changes (full log + persist thread)
src/types.ts—ModelUsageEntry+ 5 optional token fields onExecutionResult.src/core/executor.ts— terminal log gainsinputTokens/outputTokens/modelUsage(compact per-model array;compactModelUsageflattens the SDKRecord<string, ModelUsage>and renamescostUSD→costUsd);buildExecutionResultpopulates all five (exactOptionalPropertyTypes-safe).src/core/pipeline.ts+src/core/log-fields.ts— the 4 scalar counters go onpipeline.completed, pinned by a new.strict()PipelineCompletedLogSchemawith a co-located round-trip test.src/shared/ws-messages.ts+src/daemon/job-executor.ts+src/orchestrator/connection-handler.ts+src/orchestrator/history.ts— thread the counters +modelUsageacross thejob:resultWebSocket contract intomarkExecutionCompleted.src/db/migrations/016_executions_tokens.sql— 4 nullableBIGINT+ 1 nullableJSONB(model_usage) column.BIGINT(notINTEGER) becauseusageis cumulative andcache_readaccumulates as cached-prompt-size × turns, which can exceedINTEGER's 2.1B on a long run.docs/operate/observability.md— the new fields, a "Token usage and the cache hit-ratio" subsection, and the cache-read-share regression alert.Red → green
PipelineCompletedLogSchemadid not exist; the new round-trip test pins the token fields. The executor test now asserts the token fields + thecostUSD→costUsdrename land on the returnedExecutionResult. The WS round-trip test exercises the new payload fields + a malformed-modelUsagerejection.Acceptance criteria (issue #192)
inputTokens/outputTokens/modelUsage.ExecutionResult→pipeline.completed→ WS →markExecutionCompleted.pipeline.completedpinned by a strict schema + round-trip test.markExecutionCompletedpersists them.Notes
016(issue said015, but015_review_learnings_embedding.sqlis taken). Additive nullable columns; pre-existing rows stay NULL;dispatch_statsaggregates unaffected.migrate.test.tsmigration-count assertions bumped 15→16 + the new token columns asserted.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation