feat(observability): log + persist SDK token usage on executions by chrisleekr · Pull Request #209 · chrisleekr/github-app-playground

chrisleekr · 2026-06-03T07:35:21Z

What

Closes #192. The Claude Agent SDK SDKResultMessage.usage carries four token counters plus a per-model modelUsage map, but the executor logged only the two cache counters and persisted none of them — only cost_usd / duration_ms / num_turns reached the executions table. Cost alone is opaque: a 500 KB-prompt / 2-turn run and a 5 KB-prompt / 50-turn run can bill the same costUsd. And the prompt-cache hit-ratio (cache_read / (input + cache_read + cache_creation)) — the load-bearing signal for cache stability (#134) — was uncomputable without the input_tokens denominator.

Changes (full log + persist thread)

src/types.ts — ModelUsageEntry + 5 optional token fields on ExecutionResult.
src/core/executor.ts — terminal log gains inputTokens / outputTokens / modelUsage (compact per-model array; compactModelUsage flattens the SDK Record<string, ModelUsage> and renames costUSD→costUsd); buildExecutionResult populates all five (exactOptionalPropertyTypes-safe).
src/core/pipeline.ts + src/core/log-fields.ts — the 4 scalar counters go on pipeline.completed, pinned by a new .strict() PipelineCompletedLogSchema with a co-located round-trip test.
src/shared/ws-messages.ts + src/daemon/job-executor.ts + src/orchestrator/connection-handler.ts + src/orchestrator/history.ts — thread the counters + modelUsage across the job:result WebSocket contract into markExecutionCompleted.
src/db/migrations/016_executions_tokens.sql — 4 nullable BIGINT + 1 nullable JSONB (model_usage) column. BIGINT (not INTEGER) because usage is cumulative and cache_read accumulates as cached-prompt-size × turns, which can exceed INTEGER's 2.1B on a long run.
docs/operate/observability.md — the new fields, a "Token usage and the cache hit-ratio" subsection, and the cache-read-share regression alert.

Red → green

PipelineCompletedLogSchema did not exist; the new round-trip test pins the token fields. The executor test now asserts the token fields + the costUSD→costUsd rename land on the returned ExecutionResult. The WS round-trip test exercises the new payload fields + a malformed-modelUsage rejection.

Acceptance criteria (issue #192)

✅ Executor log carries inputTokens / outputTokens / modelUsage.
✅ Counters thread ExecutionResult → pipeline.completed → WS → markExecutionCompleted.
✅ pipeline.completed pinned by a strict schema + round-trip test.
✅ Migration 016 (4 BIGINT + 1 JSONB); markExecutionCompleted persists them.
✅ Docs: fields + cache hit-ratio + alert.
✅ Quality gates pass.

Notes

Migration is 016 (issue said 015, but 015_review_learnings_embedding.sql is taken). Additive nullable columns; pre-existing rows stay NULL; dispatch_stats aggregates unaffected.
migrate.test.ts migration-count assertions bumped 15→16 + the new token columns asserted.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Token usage metrics now captured and persisted, including input/output tokens and cache token counters.
- Per-model token usage breakdown available for improved cost tracking.
- Cache metrics and hit-ratio monitoring enabled with documented alerting guidance.
Documentation
- Updated observability documentation with expanded pipeline log field mappings and token usage metrics guidance.

Surface the input/output/cache token counters and per-model modelUsage from SDKResultMessage that the executor dropped: log them, thread the four scalar counters through ExecutionResult -> pipeline.completed -> the WS job:result contract -> markExecutionCompleted, and persist them to new executions columns (migration 016, BIGINT + JSONB). Unblocks prompt-size and prompt-cache hit-ratio observability. Closes #192 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-03T07:35:32Z

Warning

Review limit reached

@chrisleekr, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 32 minutes and 39 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b7e79d0f-f341-4526-8be9-9defb8b0b8c5

📥 Commits

Reviewing files that changed from the base of the PR and between 98efdd0 and da62461.

📒 Files selected for processing (3)

docs/operate/observability.md
src/core/executor.ts
src/shared/ws-messages.ts

📝 Walkthrough

Walkthrough

This PR adds end-to-end observability for Claude Agent SDK token usage, cache metrics, and per-model costs. Token and cache data are extracted at execution time, threaded through the pipeline and daemon, persisted to the database with a new schema migration, validated through logging schemas, and documented for operators.

Changes

Token Usage and Cache Metrics Observability

Layer / File(s)	Summary
Token usage type contracts `src/types.ts`	New `ModelUsageEntry` interface defines per-model token counts and cost; `ExecutionResult` extended with optional `inputTokens`, `outputTokens`, `cacheReadInputTokens`, `cacheCreationInputTokens`, and `modelUsage` array.
Executor token capture and compaction `src/core/executor.ts`, `test/core/executor.test.ts`	New `compactModelUsage()` helper transforms SDK's per-model map into `ModelUsageEntry[]`; `buildExecutionResult()` copies `usage` fields and nested model costs into result; completion log includes token metrics; test helper and new test case verify input/output token threading and model usage mapping.
Result threading through pipeline and daemon `src/core/pipeline.ts`, `src/daemon/job-executor.ts`	Token metrics from `executeAgent` result are added to `pipeline.completed` log payload; `job:result` daemon message conditionally includes token counts and `modelUsage` array for persistence.
Daemon-to-server message schema `src/shared/ws-messages.ts`, `test/shared/ws-messages.test.ts`	`job:result` Zod schema updated to accept optional top-level token counters and `modelUsage` array with per-model token breakdown; tests verify schema accepts valid entries and rejects malformed token data.
Orchestrator persistence and database schema `src/db/migrations/016_executions_tokens.sql`, `src/orchestrator/connection-handler.ts`, `src/orchestrator/history.ts`, `test/db/migrate.test.ts`	Migration adds nullable `BIGINT` columns (`input_tokens`, `output_tokens`, `cache_read_input_tokens`, `cache_creation_input_tokens`) and `model_usage` JSONB to `executions` table; `connection-handler` and `history` extend result object to conditionally persist token fields; migration tests assert new columns present.
Core pipeline log schema validation `src/core/log-fields.ts`, `test/core/log-fields.test.ts`	`PipelineCompletedLogSchema` Zod schema validates `pipeline.completed` terminal logs with required event/success/wall-clock and optional token/cost/duration fields; tests verify parsing accepts full and minimal logs and rejects invalid token values.
Operator documentation `docs/operate/observability.md`	Core pipeline log section expanded with detailed event-key mappings, enumerated `pipeline.stage` names, token counter descriptions, cache hit-ratio formula (`cache_read / (cache_read + cache_creation)`), and alerting guidance based on cache-read share baseline.

Sequence Diagram(s)

sequenceDiagram
  participant SDK as Claude Agent SDK
  participant Executor
  participant Pipeline
  participant Daemon as Job Executor
  participant ConnectionHandler
  participant History as markExecutionCompleted
  participant Database
  SDK->>Executor: modelUsage map, input_tokens, output_tokens
  Executor->>Executor: compactModelUsage()
  Executor->>Pipeline: ExecutionResult with tokens
  Pipeline->>Daemon: token metrics in completion log
  Daemon->>ConnectionHandler: job:result with inputTokens, modelUsage
  ConnectionHandler->>History: result object with token fields
  History->>Database: UPDATE executions SET input_tokens, output_tokens, model_usage

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

#192: The changes directly implement token usage and cache metrics observability by adding ModelUsageEntry type, extracting metrics via compactModelUsage(), threading through pipeline/daemon, persisting with migration 016, validating with PipelineCompletedLogSchema, and documenting cache hit-ratio formulas and alerting guidance.

Suggested labels

type: feature ✨, type: docs 📋

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(observability): log + persist SDK token usage on executions' clearly and specifically summarizes the main change: capturing and persisting Claude Agent SDK token usage metrics throughout the execution pipeline.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR improves observability for Claude Agent SDK executions by logging and persisting full token-usage telemetry (including per-model breakdown) so operators can distinguish cost drivers and compute prompt-cache hit ratios over time.

Changes:

Extend execution result types to carry token counters and a per-model modelUsage array (with costUSD → costUsd normalization).
Add strict schema + tests for pipeline.completed log shape including token counters, and thread token fields through the daemon WS job:result contract into DB persistence.
Add migration 016_executions_tokens to persist four token counters (BIGINT) plus model_usage (JSONB) on executions, and update docs with cache hit-ratio guidance.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
test/shared/ws-messages.test.ts	Extends WS schema tests to include token fields and rejects malformed `modelUsage` entries.
test/db/migrate.test.ts	Updates migration count expectations and asserts new `executions` token columns exist.
test/core/log-fields.test.ts	Adds round-trip tests for strict `PipelineCompletedLogSchema` including token counters.
test/core/executor.test.ts	Asserts executor threads token fields + compacts/renames `modelUsage` correctly into `ExecutionResult` and logs.
src/types.ts	Introduces `ModelUsageEntry` and adds optional token usage fields to `ExecutionResult`.
src/shared/ws-messages.ts	Extends `job:result` schema to accept token counters and `modelUsage` array.
src/orchestrator/history.ts	Persists token counters and `model_usage` JSONB into `executions` on completion.
src/orchestrator/connection-handler.ts	Threads token fields + `modelUsage` from WS payload into `markExecutionCompleted`.
src/db/migrations/016_executions_tokens.sql	Adds BIGINT token columns and JSONB `model_usage` to `executions`.
src/daemon/job-executor.ts	Includes token counters + `modelUsage` in daemon `job:result` messages when present.
src/core/pipeline.ts	Logs token counters on `pipeline.completed` event.
src/core/log-fields.ts	Adds strict `PipelineCompletedLogSchema` including token counters.
src/core/executor.ts	Logs `inputTokens`/`outputTokens` and compacts SDK `modelUsage` map into persisted array form.
docs/operate/observability.md	Documents new fields and explains cache hit-ratio metric + alerting approach.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/operate/observability.md`:
- Around line 57-63: Mention that the cache-hit ratio formula
cache_read_input_tokens / (input_tokens + cache_read_input_tokens +
cache_creation_input_tokens) can produce NaN if the denominator is zero; update
the docs to instruct dashboard/alert queries to guard by checking the
denominator > 0 (or using a safe-coalesce/default value) before division and to
treat the metric as undefined/ignore the datapoint when all three counters are
zero; also update the example alert expression sum(cache_read_input_tokens) /
sum(input_tokens + cache_read_input_tokens + cache_creation_input_tokens) to
include the same zero-denominator guard.

In `@src/shared/ws-messages.ts`:
- Line 394: The `model` Zod schema currently allows empty strings; update the
`model: z.string()` declaration in src/shared/ws-messages.ts to validate
non-empty values by changing it to use `.min(1)` (e.g., `model:
z.string().min(1)`), optionally supplying a short error message; this adds
defensive validation at the schema boundary to reject empty model names.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a60f8844-c2dc-4a4a-a70d-cbb500fce99a

📥 Commits

Reviewing files that changed from the base of the PR and between 8b0e8ef and 98efdd0.

📒 Files selected for processing (14)

docs/operate/observability.md
src/core/executor.ts
src/core/log-fields.ts
src/core/pipeline.ts
src/daemon/job-executor.ts
src/db/migrations/016_executions_tokens.sql
src/orchestrator/connection-handler.ts
src/orchestrator/history.ts
src/shared/ws-messages.ts
src/types.ts
test/core/executor.test.ts
test/core/log-fields.test.ts
test/db/migrate.test.ts
test/shared/ws-messages.test.ts

Treat an empty SDK modelUsage map as undefined (omit rather than persist []), sort entries by model for deterministic order, require model.min(1) on the WS schema, and note the zero-denominator cache-ratio edge case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings June 3, 2026 07:35

Copilot started reviewing on behalf of chrisleekr June 3, 2026 07:35 View session

github-actions Bot added the type: feature ✨ label Jun 3, 2026

coderabbitai Bot added the type: docs 📋 label Jun 3, 2026

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread src/core/executor.ts Outdated

Comment thread src/shared/ws-messages.ts Outdated

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread docs/operate/observability.md Outdated

Comment thread src/shared/ws-messages.ts Outdated

chrisleekr merged commit 5407dcd into main Jun 3, 2026
22 checks passed

chrisleekr deleted the fix/issue-192 branch June 3, 2026 08:14

chrisleekr mentioned this pull request Jun 4, 2026

feat(observability): emit structured circuit.* events for triage breaker with counters and openMs #216

Open

chrisleekr mentioned this pull request Jun 14, 2026

feat(observability): structured idempotency.* events for claimDelivery outcomes to make dedup-hit and fail-open rates queryable #232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observability): log + persist SDK token usage on executions#209

feat(observability): log + persist SDK token usage on executions#209
chrisleekr merged 2 commits into
mainfrom
fix/issue-192

chrisleekr commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chrisleekr commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Changes (full log + persist thread)

Red → green

Acceptance criteria (issue #192)

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Suggested labels

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chrisleekr commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading