feat: add token metering across the full stack by skokaina · Pull Request #19 · agents-oss/agentspec

skokaina · 2026-04-01T12:11:34Z

Add in-process token usage aggregation that flows from agent process through heartbeat push to CRD status and VS Code plugin.

SDK (TypeScript):

UsageLedger class: O(1) record(), snapshot with window reset
Reporter heartbeat payload now includes usage field
Exported UsageLedger, UsageSnapshot, ModelUsageEntry from @agentspec/sdk

Sidecar:

GET /usage endpoint aggregates modelCalls from the audit ring

Control Plane (Python):

HeartbeatRequest accepts optional usage field
Heartbeat DB model stores usage (nullable JSON column)
build_status_patch forwards usage to CRD status
GET /agents/{name}/usage endpoint
Extracted _get_validated_field helper to deduplicate endpoint pattern

Operator:

AgentObservation CRD: status.usage object + Tokens printer column

sdk-langgraph (Python):

UsageLedger class (thread-safe, mirrors TypeScript API)
instrument_call_model accepts optional ledger parameter
Negative token values clamped to 0, snapshot copies are immutable

Tests: 32 new tests across SDK, sidecar, control-plane, sdk-langgraph

Add in-process token usage aggregation that flows from agent process through heartbeat push to CRD status and VS Code plugin. SDK (TypeScript): - UsageLedger class: O(1) record(), snapshot with window reset - Reporter heartbeat payload now includes usage field - Exported UsageLedger, UsageSnapshot, ModelUsageEntry from @agentspec/sdk Sidecar: - GET /usage endpoint aggregates modelCalls from the audit ring Control Plane (Python): - HeartbeatRequest accepts optional usage field - Heartbeat DB model stores usage (nullable JSON column) - build_status_patch forwards usage to CRD status - GET /agents/{name}/usage endpoint - Extracted _get_validated_field helper to deduplicate endpoint pattern Operator: - AgentObservation CRD: status.usage object + Tokens printer column sdk-langgraph (Python): - UsageLedger class (thread-safe, mirrors TypeScript API) - instrument_call_model accepts optional ledger parameter - Negative token values clamped to 0, snapshot copies are immutable Tests: 32 new tests across SDK, sidecar, control-plane, sdk-langgraph

@@ -0,0 +1,101 @@
+"""Tests for UsageLedger — in-process token usage counter."""
+
+from datetime import datetime, timezone


+
+from datetime import datetime, timezone
+
+import pytest


- README: add "Track token usage" to feature list, mention /usage sidecar endpoint, add kubectl token column example - operating-modes.md: add GET /usage to sidecar and operator endpoint tables - runtime-introspection.md: new "Token Usage Tracking" section covering UsageLedger API, data flow, sidecar/operator query endpoints, CRD visibility, and VS Code rendering

iliassjabali · 2026-04-11T14:37:22Z

+
+// ── Private helpers ──────────────────────────────────────────────────────────
+
+function aggregateFromRing(entries: AuditEntry[]): UsageResponse {


Sidecar GET /usage aggregates from the audit ring here, but the heartbeat pushed from the SDK ships reporter.usage, which is a separate in-process UsageLedger populated by record() calls (or by instrument_call_model(..., ledger=...) in sdk-langgraph). Both paths are ultimately written by the same agent, but through independent code with independent failure modes. They can diverge when one path is wired and the other is not, or when one push fails. Same endpoint name, different semantics, no way for the user to tell.

Note: the sidecar runs as its own process (see packages/sidecar/src/index.ts), so it cannot read the agent's UsageLedger by shared memory. The ring's modelCalls are populated out of band by the agent posting to /agentspec/events (see packages/sidecar/src/control-plane/events.ts). Any unification needs to respect that boundary.

Two cleaner options to consider:

a) Make the ledger an internal detail of the reporter rather than an opt-in. Any agent that uses reporter automatically has populated usage without needing to pass a separate ledger= parameter. That makes reporter.usage the SDK's single source of truth, and the sidecar ring becomes a diagnostic mirror that is allowed to lag.

b) Add GET /agentspec/usage to the SDK introspection surface, next to /agentspec/health in packages/sdk/src/agent/adapters/fastify.ts and express.ts, returning reporter.usage.snapshot(false). The sidecar's /usage handler then fetches that endpoint when available and falls back to aggregateFromRing() otherwise, preserving the "works without SDK integration" guarantee from CLAUDE.md.

Either way, the key invariant is that a user reading /usage in sidecar mode and /api/v1/agents/{name}/usage in operator mode should see consistent numbers for the same agent.

iliassjabali · 2026-04-11T14:37:26Z

    try {
      const health = await this.getReport()
      const gap = runAudit(this.manifest)
+      const usage = this.usage.snapshot(true)


this.usage.snapshot(true) drains the ledger before the fetch() fires. If the POST fails — network error, control-plane 5xx, timeout, rate-limit — the tokens from that window have already been removed from the ledger but never delivered. Silent data loss, one interval at a time.

Suggested fix: peek first, drain only on success.

// packages/sdk/src/agent/reporter.ts const health = await this.getReport() const gap = runAudit(this.manifest) const usage = this.usage.snapshot(false) // ← peek, don't drain // ... build body, trim to 64KB, fetch ... const res = await fetch(`${opts.controlPlaneUrl}/api/v1/heartbeat`, { ... }) if (res.ok) { this.usage.snapshot(true) // ← drain only on 2xx }

This introduces a subtle race: if record() is called between the peek and the drain, those new tokens get dropped on the floor. Two ways to handle it:

Add UsageLedger.drainUpTo(snapshot) — subtracts the already-shipped counts from each entry rather than clearing the map. No races, correctly accumulates tokens recorded during the in-flight fetch.

Accept the race — the window between peek and drain is a single event-loop tick plus fetch() latency. At-most-once-on-failure is a strict improvement over the current at-most-zero-on-failure, even with the race.

Option 2 is probably the right call for v1 (smaller diff, strictly better). Whichever you pick, please add a test: mock fetch to reject on the first call, record tokens, advance the timer, assert the next successful heartbeat includes the first window's tokens.

skokaina assigned iliassjabali Apr 1, 2026

github-code-quality bot found potential problems Apr 1, 2026

View reviewed changes

iliassjabali reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add token metering across the full stack#19

feat: add token metering across the full stack#19
skokaina wants to merge 2 commits intomainfrom
feature/token-metering

skokaina commented Apr 1, 2026

Uh oh!

iliassjabali Apr 11, 2026 •

edited

Loading

Uh oh!

iliassjabali Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,101 @@
		"""Tests for UsageLedger — in-process token usage counter."""

		from datetime import datetime, timezone


		// ── Private helpers ──────────────────────────────────────────────────────────

		function aggregateFromRing(entries: AuditEntry[]): UsageResponse {

Conversation

skokaina commented Apr 1, 2026

Uh oh!

iliassjabali Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iliassjabali Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iliassjabali Apr 11, 2026 •

edited

Loading

iliassjabali Apr 11, 2026 •

edited

Loading