Skip to content

feat: add token metering across the full stack#19

Open
skokaina wants to merge 2 commits intomainfrom
feature/token-metering
Open

feat: add token metering across the full stack#19
skokaina wants to merge 2 commits intomainfrom
feature/token-metering

Conversation

@skokaina
Copy link
Copy Markdown
Contributor

@skokaina skokaina commented Apr 1, 2026

Add in-process token usage aggregation that flows from agent process through heartbeat push to CRD status and VS Code plugin.

SDK (TypeScript):

  • UsageLedger class: O(1) record(), snapshot with window reset
  • Reporter heartbeat payload now includes usage field
  • Exported UsageLedger, UsageSnapshot, ModelUsageEntry from @agentspec/sdk

Sidecar:

  • GET /usage endpoint aggregates modelCalls from the audit ring

Control Plane (Python):

  • HeartbeatRequest accepts optional usage field
  • Heartbeat DB model stores usage (nullable JSON column)
  • build_status_patch forwards usage to CRD status
  • GET /agents/{name}/usage endpoint
  • Extracted _get_validated_field helper to deduplicate endpoint pattern

Operator:

  • AgentObservation CRD: status.usage object + Tokens printer column

sdk-langgraph (Python):

  • UsageLedger class (thread-safe, mirrors TypeScript API)
  • instrument_call_model accepts optional ledger parameter
  • Negative token values clamped to 0, snapshot copies are immutable

Tests: 32 new tests across SDK, sidecar, control-plane, sdk-langgraph

Add in-process token usage aggregation that flows from agent process
through heartbeat push to CRD status and VS Code plugin.

SDK (TypeScript):
- UsageLedger class: O(1) record(), snapshot with window reset
- Reporter heartbeat payload now includes usage field
- Exported UsageLedger, UsageSnapshot, ModelUsageEntry from @agentspec/sdk

Sidecar:
- GET /usage endpoint aggregates modelCalls from the audit ring

Control Plane (Python):
- HeartbeatRequest accepts optional usage field
- Heartbeat DB model stores usage (nullable JSON column)
- build_status_patch forwards usage to CRD status
- GET /agents/{name}/usage endpoint
- Extracted _get_validated_field helper to deduplicate endpoint pattern

Operator:
- AgentObservation CRD: status.usage object + Tokens printer column

sdk-langgraph (Python):
- UsageLedger class (thread-safe, mirrors TypeScript API)
- instrument_call_model accepts optional ledger parameter
- Negative token values clamped to 0, snapshot copies are immutable

Tests: 32 new tests across SDK, sidecar, control-plane, sdk-langgraph
@@ -0,0 +1,101 @@
"""Tests for UsageLedger — in-process token usage counter."""

from datetime import datetime, timezone

from datetime import datetime, timezone

import pytest
- README: add "Track token usage" to feature list, mention /usage
  sidecar endpoint, add kubectl token column example
- operating-modes.md: add GET /usage to sidecar and operator endpoint
  tables
- runtime-introspection.md: new "Token Usage Tracking" section covering
  UsageLedger API, data flow, sidecar/operator query endpoints, CRD
  visibility, and VS Code rendering

// ── Private helpers ──────────────────────────────────────────────────────────

function aggregateFromRing(entries: AuditEntry[]): UsageResponse {
Copy link
Copy Markdown
Collaborator

@iliassjabali iliassjabali Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidecar GET /usage aggregates from the audit ring here, but the heartbeat pushed from the SDK ships reporter.usage, which is a separate in-process UsageLedger populated by record() calls (or by instrument_call_model(..., ledger=...) in sdk-langgraph). Both paths are ultimately written by the same agent, but through independent code with independent failure modes. They can diverge when one path is wired and the other is not, or when one push fails. Same endpoint name, different semantics, no way for the user to tell.

Note: the sidecar runs as its own process (see packages/sidecar/src/index.ts), so it cannot read the agent's UsageLedger by shared memory. The ring's modelCalls are populated out of band by the agent posting to /agentspec/events (see packages/sidecar/src/control-plane/events.ts). Any unification needs to respect that boundary.

Two cleaner options to consider:

a) Make the ledger an internal detail of the reporter rather than an opt-in. Any agent that uses reporter automatically has populated usage without needing to pass a separate ledger= parameter. That makes reporter.usage the SDK's single source of truth, and the sidecar ring becomes a diagnostic mirror that is allowed to lag.

b) Add GET /agentspec/usage to the SDK introspection surface, next to /agentspec/health in packages/sdk/src/agent/adapters/fastify.ts and express.ts, returning reporter.usage.snapshot(false). The sidecar's /usage handler then fetches that endpoint when available and falls back to aggregateFromRing() otherwise, preserving the "works without SDK integration" guarantee from CLAUDE.md.

Either way, the key invariant is that a user reading /usage in sidecar mode and /api/v1/agents/{name}/usage in operator mode should see consistent numbers for the same agent.

try {
const health = await this.getReport()
const gap = runAudit(this.manifest)
const usage = this.usage.snapshot(true)
Copy link
Copy Markdown
Collaborator

@iliassjabali iliassjabali Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.usage.snapshot(true) drains the ledger before the fetch() fires. If the POST fails — network error, control-plane 5xx, timeout, rate-limit — the tokens from that window have already been removed from the ledger but never delivered. Silent data loss, one interval at a time.

Suggested fix: peek first, drain only on success.

// packages/sdk/src/agent/reporter.ts
const health = await this.getReport()
const gap = runAudit(this.manifest)
const usage = this.usage.snapshot(false) // ← peek, don't drain

// ... build body, trim to 64KB, fetch ...

const res = await fetch(`${opts.controlPlaneUrl}/api/v1/heartbeat`, { ... })
if (res.ok) {
  this.usage.snapshot(true) // ← drain only on 2xx
}

This introduces a subtle race: if record() is called between the peek and the drain, those new tokens get dropped on the floor. Two ways to handle it:

  1. Add UsageLedger.drainUpTo(snapshot) — subtracts the already-shipped counts from each entry rather than clearing the map. No races, correctly accumulates tokens recorded during the in-flight fetch.
  2. Accept the race — the window between peek and drain is a single event-loop tick plus fetch() latency. At-most-once-on-failure is a strict improvement over the current at-most-zero-on-failure, even with the race.

Option 2 is probably the right call for v1 (smaller diff, strictly better). Whichever you pick, please add a test: mock fetch to reject on the first call, record tokens, advance the timer, assert the next successful heartbeat includes the first window's tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants