Conversation
Add in-process token usage aggregation that flows from agent process
through heartbeat push to CRD status and VS Code plugin.
SDK (TypeScript):
- UsageLedger class: O(1) record(), snapshot with window reset
- Reporter heartbeat payload now includes usage field
- Exported UsageLedger, UsageSnapshot, ModelUsageEntry from @agentspec/sdk
Sidecar:
- GET /usage endpoint aggregates modelCalls from the audit ring
Control Plane (Python):
- HeartbeatRequest accepts optional usage field
- Heartbeat DB model stores usage (nullable JSON column)
- build_status_patch forwards usage to CRD status
- GET /agents/{name}/usage endpoint
- Extracted _get_validated_field helper to deduplicate endpoint pattern
Operator:
- AgentObservation CRD: status.usage object + Tokens printer column
sdk-langgraph (Python):
- UsageLedger class (thread-safe, mirrors TypeScript API)
- instrument_call_model accepts optional ledger parameter
- Negative token values clamped to 0, snapshot copies are immutable
Tests: 32 new tests across SDK, sidecar, control-plane, sdk-langgraph
| @@ -0,0 +1,101 @@ | |||
| """Tests for UsageLedger — in-process token usage counter.""" | |||
|
|
|||
| from datetime import datetime, timezone | |||
|
|
||
| from datetime import datetime, timezone | ||
|
|
||
| import pytest |
- README: add "Track token usage" to feature list, mention /usage sidecar endpoint, add kubectl token column example - operating-modes.md: add GET /usage to sidecar and operator endpoint tables - runtime-introspection.md: new "Token Usage Tracking" section covering UsageLedger API, data flow, sidecar/operator query endpoints, CRD visibility, and VS Code rendering
|
|
||
| // ── Private helpers ────────────────────────────────────────────────────────── | ||
|
|
||
| function aggregateFromRing(entries: AuditEntry[]): UsageResponse { |
There was a problem hiding this comment.
Sidecar GET /usage aggregates from the audit ring here, but the heartbeat pushed from the SDK ships reporter.usage, which is a separate in-process UsageLedger populated by record() calls (or by instrument_call_model(..., ledger=...) in sdk-langgraph). Both paths are ultimately written by the same agent, but through independent code with independent failure modes. They can diverge when one path is wired and the other is not, or when one push fails. Same endpoint name, different semantics, no way for the user to tell.
Note: the sidecar runs as its own process (see packages/sidecar/src/index.ts), so it cannot read the agent's UsageLedger by shared memory. The ring's modelCalls are populated out of band by the agent posting to /agentspec/events (see packages/sidecar/src/control-plane/events.ts). Any unification needs to respect that boundary.
Two cleaner options to consider:
a) Make the ledger an internal detail of the reporter rather than an opt-in. Any agent that uses reporter automatically has populated usage without needing to pass a separate ledger= parameter. That makes reporter.usage the SDK's single source of truth, and the sidecar ring becomes a diagnostic mirror that is allowed to lag.
b) Add GET /agentspec/usage to the SDK introspection surface, next to /agentspec/health in packages/sdk/src/agent/adapters/fastify.ts and express.ts, returning reporter.usage.snapshot(false). The sidecar's /usage handler then fetches that endpoint when available and falls back to aggregateFromRing() otherwise, preserving the "works without SDK integration" guarantee from CLAUDE.md.
Either way, the key invariant is that a user reading /usage in sidecar mode and /api/v1/agents/{name}/usage in operator mode should see consistent numbers for the same agent.
| try { | ||
| const health = await this.getReport() | ||
| const gap = runAudit(this.manifest) | ||
| const usage = this.usage.snapshot(true) |
There was a problem hiding this comment.
this.usage.snapshot(true) drains the ledger before the fetch() fires. If the POST fails — network error, control-plane 5xx, timeout, rate-limit — the tokens from that window have already been removed from the ledger but never delivered. Silent data loss, one interval at a time.
Suggested fix: peek first, drain only on success.
// packages/sdk/src/agent/reporter.ts
const health = await this.getReport()
const gap = runAudit(this.manifest)
const usage = this.usage.snapshot(false) // ← peek, don't drain
// ... build body, trim to 64KB, fetch ...
const res = await fetch(`${opts.controlPlaneUrl}/api/v1/heartbeat`, { ... })
if (res.ok) {
this.usage.snapshot(true) // ← drain only on 2xx
}This introduces a subtle race: if record() is called between the peek and the drain, those new tokens get dropped on the floor. Two ways to handle it:
- Add
UsageLedger.drainUpTo(snapshot)— subtracts the already-shipped counts from each entry rather than clearing the map. No races, correctly accumulates tokens recorded during the in-flight fetch. - Accept the race — the window between peek and drain is a single event-loop tick plus
fetch()latency. At-most-once-on-failure is a strict improvement over the current at-most-zero-on-failure, even with the race.
Option 2 is probably the right call for v1 (smaller diff, strictly better). Whichever you pick, please add a test: mock fetch to reject on the first call, record tokens, advance the timer, assert the next successful heartbeat includes the first window's tokens.
Add in-process token usage aggregation that flows from agent process through heartbeat push to CRD status and VS Code plugin.
SDK (TypeScript):
Sidecar:
Control Plane (Python):
Operator:
sdk-langgraph (Python):
Tests: 32 new tests across SDK, sidecar, control-plane, sdk-langgraph