Skip to content

feat: refresh to mid-2026 Claude/Anthropic capabilities#21

Merged
stxkxs merged 1 commit into
mainfrom
feat/claude-mid-2026-currency
Jun 4, 2026
Merged

feat: refresh to mid-2026 Claude/Anthropic capabilities#21
stxkxs merged 1 commit into
mainfrom
feat/claude-mid-2026-currency

Conversation

@stxkxs

@stxkxs stxkxs commented Jun 3, 2026

Copy link
Copy Markdown
Member

See the commit message for full details.

Summary

  • Bumps every Opus escalation path (advisor, lab roles, LLM_POLICY, docs) claude-opus-4-6claude-opus-4-8, and adds it to the Bedrock map (4.6/4.7 kept for back-compat) with a new resolveModelId test.
  • Unsticks @anthropic-ai/claude-agent-sdk ^0.1.0^0.3.161 (lockfile regenerated; validated against the installed 0.3.161 — lint, 267 tests, query() export).
  • Model-aware + cache-aware cost reporting in usage.ts; documents the 2026-06-15 Agent SDK + claude -p subscription-credit billing in docs/transports.md.

Verification

npm run lint / build / format:check clean · npm test 267/267 · npm audit 0 vulnerabilities.

Follow-ups (not in this PR)

  • Durable fix for llm-policy.json escalation tier lives upstream in the nanohype repo (vendored copy reverts on next sync).
  • P2 evaluations are planned separately (gate citation-verification is next).

Six months of platform drift left a few concrete staleness points. This
brings the model references, the Agent SDK pin, and the cost/docs surfaces
current, verified against primary Anthropic/AWS docs.

─── Models & inference ───

- Bump every Opus escalation path to claude-opus-4-8 (current flagship,
  GA 2026-05-28): the advisor model (src/advisor.ts), the external-reviewer
  and prompt-optimizer lab roles (src/team/lab.ts), and the LLM_POLICY
  escalation tier in src/standards.ts that FACTORY_PREAMBLE injects into
  every factory role and produced app. Sonnet 4.6 / Haiku 4.5 remain the
  current Sonnet/Haiku tiers and are unchanged.
- Add claude-opus-4-8 -> anthropic.claude-opus-4-8 to BEDROCK_MODEL_IDS
  (no -v1/date suffix, per the Bedrock model card); keep the 4.6/4.7 entries
  for back-compat. New resolveModelId assertion in inference.test.ts.

─── Agent SDK ───

- Unstick @anthropic-ai/claude-agent-sdk from ^0.1.0 (which cannot resolve
  0.3.x) to ^0.3.161; lockfile regenerated to 0.3.161. The sdk / sdk-k8s
  runtimes were silently held ~2 minors back. Validated against the
  installed 0.3.161: lint clean, 267 tests pass, and the query() export
  resolves.
- sdk-events.ts gains a documented no-op branch for non-init `system`
  messages, so any status/compaction subtype the 0.3.x SDK introduces is an
  explicit ignore rather than a silent drop.

─── Cost reporting ───

- src/usage.ts cost estimation is now model-aware (Opus $5/$25, Sonnet
  $3/$15, Haiku $1/$5, resolved per session from session.agent.model.id) and
  cache-token-aware (read 0.1x, 5-minute write 1.25x). Previously every role
  was priced at a flat Sonnet rate and cache tokens were ignored. fab sums
  API-returned token counts (there is no client-side tokenizer), so only the
  rate card is maintained here.

─── Docs & currency sweep ───

- docs/transports.md documents the 2026-06-15 Agent SDK + `claude -p`
  subscription billing: a monthly Agent SDK credit on Pro/Max/Team/Enterprise
  separate from interactive chat limits; API-key accounts stay pay-as-you-go.
- Illustrative opus-4-6 references swept to 4-8: the fab.schema.json model
  example, the state.test override fixture, and the kagent /
  eks-agent-platform baseline skill docs.
- Vendored src/standards/llm-policy.json escalation tier bumped for
  consistency; its canonical source is the nanohype repo, so the durable fix
  lands there on the next standards sync.

Verification: npm run lint / build / format:check clean; npm test 267/267;
npm audit 0 vulnerabilities.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
@stxkxs stxkxs marked this pull request as ready for review June 4, 2026 02:06
@stxkxs stxkxs merged commit 716e0f7 into main Jun 4, 2026
2 checks passed
stxkxs added a commit that referenced this pull request Jun 4, 2026
…ricing

The native-vs-hand-rolled audit found one genuine catch-up: budget enforcement
and cost tracking only worked on the managed-agents transport. This closes that
gap with native SDK primitives and unifies the pricing tables. The budget
POLICY stays fab's; only enforcement + cost now ride native knobs where the
transport exposes them.

─── Native budget + cost on sdk / sdk-k8s / claude-cli (Rank 1) ───

These transports had NO budget enforcement and NO cost tracking: the tracker in
streamSessionWithAdvisor only accumulates `span.model_request_end` events, which
only managed-agents emits.

- sdk.ts passes `maxBudgetUsd` (from getBudgetLimit) into the Agent SDK
  query(); the SDK stops the run with an `error_max_budget_usd` result when the
  USD cap is exceeded, which sdk-events already maps to session.error. Native
  per-run budget enforcement on the SDK transports.
- sdk-events attaches the result's native `total_cost_usd` onto session.status_idle;
  streamSessionWithAdvisor reads it and reports the actual run cost on those
  transports (previously $0 / unknown). claude-cli rides the same translator, so
  it's covered too.

─── Pricing consolidation (Rank 2) ───

- New src/pricing.ts is the single source: MODEL_RATES (Opus $5/$25, Sonnet
  $3/$15, Haiku $1/$5) + cache multipliers (read 0.1x, write 1.25x) + rateFor +
  estimateCost. Replaces four divergent copies — usage.ts's (good), workflows.ts
  MODEL_PRICING (speed-keyed Sonnet-flat), and perf.ts's per-role + totals
  (Sonnet-flat inline).
- workflows.ts budget tracker now prices spans model+cache-aware via the role's
  model instead of a flat Sonnet rate.
- usage.ts + perf.ts consume pricing.ts; perf per-role and totals are now
  model-aware, so Opus roles are no longer under-priced.

─── Tests ───

- pricing.test.ts — rateFor tiers + sonnet fallback; estimateCost input/output +
  cache-read (0.1x) / cache-write (1.25x) + null cache fields.
- sdk-events.test.ts — total_cost_usd attaches to status_idle; omitted on the
  managed-agents result shape.

Depends on the SDK 0.3.x bump + model-aware usage.ts from #21 (this branch is
stacked on it). Verification: npm run lint / build / format:check clean; 274 tests.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
stxkxs added a commit that referenced this pull request Jun 4, 2026
…ricing

The native-vs-hand-rolled audit found one genuine catch-up: budget enforcement
and cost tracking only worked on the managed-agents transport. This closes that
gap with native SDK primitives and unifies the pricing tables. The budget
POLICY stays fab's; only enforcement + cost now ride native knobs where the
transport exposes them.

─── Native budget + cost on sdk / sdk-k8s / claude-cli (Rank 1) ───

These transports had NO budget enforcement and NO cost tracking: the tracker in
streamSessionWithAdvisor only accumulates `span.model_request_end` events, which
only managed-agents emits.

- sdk.ts passes `maxBudgetUsd` (from getBudgetLimit) into the Agent SDK
  query(); the SDK stops the run with an `error_max_budget_usd` result when the
  USD cap is exceeded, which sdk-events already maps to session.error. Native
  per-run budget enforcement on the SDK transports.
- sdk-events attaches the result's native `total_cost_usd` onto session.status_idle;
  streamSessionWithAdvisor reads it and reports the actual run cost on those
  transports (previously $0 / unknown). claude-cli rides the same translator, so
  it's covered too.

─── Pricing consolidation (Rank 2) ───

- New src/pricing.ts is the single source: MODEL_RATES (Opus $5/$25, Sonnet
  $3/$15, Haiku $1/$5) + cache multipliers (read 0.1x, write 1.25x) + rateFor +
  estimateCost. Replaces four divergent copies — usage.ts's (good), workflows.ts
  MODEL_PRICING (speed-keyed Sonnet-flat), and perf.ts's per-role + totals
  (Sonnet-flat inline).
- workflows.ts budget tracker now prices spans model+cache-aware via the role's
  model instead of a flat Sonnet rate.
- usage.ts + perf.ts consume pricing.ts; perf per-role and totals are now
  model-aware, so Opus roles are no longer under-priced.

─── Tests ───

- pricing.test.ts — rateFor tiers + sonnet fallback; estimateCost input/output +
  cache-read (0.1x) / cache-write (1.25x) + null cache fields.
- sdk-events.test.ts — total_cost_usd attaches to status_idle; omitted on the
  managed-agents result shape.

Depends on the SDK 0.3.x bump + model-aware usage.ts from #21 (this branch is
stacked on it). Verification: npm run lint / build / format:check clean; 274 tests.

Co-authored-by: stxkxsbot <275011021+stxkxsbot@users.noreply.github.com>
@stxkxs stxkxs deleted the feat/claude-mid-2026-currency branch June 4, 2026 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant