Skip to content

Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end#36

Merged
konard merged 6 commits into
mainfrom
issue-35-62a0d9107370
Jun 9, 2026
Merged

Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end#36
konard merged 6 commits into
mainfrom
issue-35-62a0d9107370

Conversation

@konard

@konard konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #35.

This PR uses the local Docker + Claude MAX access to run the router end-to-end,
verifies that everything the docs claim actually works, fixes the one genuine
code bug found, and adds the "limit how much each task can use" capability the
issue asks for.

Three outcomes:

  1. Fixed the real root-cause bug. Real Claude Code writes its OAuth session
    to ~/.claude/.credentials.json nested under a claudeAiOauth object. The
    router only read a flat accessToken, so against an actual login it found
    no token. src/oauth.rs now reads both the nested and flat layouts.
  2. Proved token hiding + transparent passthrough end-to-end against
    api.anthropic.com, both natively and through the Docker image. A client
    sending only a la_sk_ token (no anthropic-version, no OAuth beta header)
    gets a working upstream response; the real OAuth token never appears in logs
    or client-visible output.
  3. Added a per-token request budget so a scoped token can be handed to a
    separate task with a hard cap on how many upstream requests it may make.

Changes

Fixed — nested Claude MAX credential layout (src/oauth.rs)

  • extract_token() / expires_at_ms() accept both the nested claudeAiOauth
    object (real Claude Code) and the flat {"accessToken": ...} layout.
  • doctor probes the credential file and reports found, token OK /
    found, NO TOKEN / MISSING.

Added — per-token request budget (max_requests)

  • src/storage.rs: TokenRecord gains max_requests: Option<u64> and
    used_requests: u64 (both #[serde(default)], backward compatible); the Lino
    text codec round-trips (max_requests N) / (used_requests N);
    TokenStore::try_consume_request checks-and-increments.
  • src/token.rs: issue_token_full(...) writes the cap;
    enforce_request_budget(...) returns TokenError::LimitExceeded once hit.
  • src/proxy.rs: every forwarding path (Anthropic, OpenAI, Gonka) enforces the
    budget after token validation and returns 429 rate_limit_error when
    exhausted. Admin token endpoints were extracted to a new src/token_admin.rs
    to stay under the 1000-line per-file CI limit.
  • src/cli.rs / src/main.rs / src/token_admin.rs: tokens issue --max-requests, the POST /api/tokens max_requests field, and a used/max
    column in tokens list.

Docs

  • README: documented the nested credential layout, transparent header injection
    (anthropic-version default + anthropic-beta: oauth-2025-04-20), and the
    per-token budget; corrected the stale note claiming revocations are lost on
    restart (records are persisted).
  • docs/case-studies/issue-35/: full case study — requirement-by-requirement
    trace, online research (primary sources), existing-components survey (LiteLLM
    virtual keys/budgets, Portkey, Kong AI Gateway, community Claude proxies), and
    redacted live + Docker evidence.
  • changelog.d/20260609_233000_issue_35_local_testing.md (bump: minor).

How it was verified (live + Docker)

Performed against https://api.anthropic.com with a copy of the real Claude
MAX credentials. The original ~/.claude/.credentials.json was only read/copied
— never modified or deleted (confirmed unchanged at 471 bytes).

Check Result
Client sends only la_sk_ token to count_tokens HTTP 200 {"input_tokens":13}
Real OAuth token in server / container logs 0 occurrences
Missing token 401
Invalid token 401
Revoked token 403
Capped token after its budget our 429 Token has reached its request limit (no upstream request_id)
Usage persistence text store (max_requests 2) (used_requests 2); tokens list2/2
Docker image (Dockerfile) with copied creds mounted :ro identical results; nested creds read; no token leak

Evidence: docs/case-studies/issue-35/raw/ (native) and
docs/case-studies/issue-35/raw/docker/ (container).

Note: live /v1/messages inference returned an upstream 429 with a genuine
Anthropic request_id — a real account-level inference rate limit on the
shared MAX account, not a router bug. count_tokens (not inference-metered)
returning 200 through the same path proves the proxy path itself is healthy.

Tests

  • New unit tests in src/token.rs: test_unlimited_token_never_hits_budget,
    test_request_budget_enforced (caps at 3, 4th = LimitExceeded, usage
    persisted), test_budget_for_unknown_token_is_permitted.
  • src/storage.rs round-trip literals updated for the new fields.
  • src/oauth.rs tests cover nested + flat layouts.

Local CI gate (all green)

cargo fmt --check · cargo clippy --all-targets --all-features ·
file-size check (all src/*.rs < 1000 lines) · cargo test --all-features
(141 tests pass) · cargo test --doc · cargo build --release.

Version bump is intentionally not hand-edited in Cargo.toml — the repo
derives it from the changelog.d fragment (bump: minor), enforced by the
prevent_manual_version_modification policy.

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #35
@konard konard self-assigned this Jun 9, 2026
konard added 3 commits June 9, 2026 23:14
…ement

Add an optional max_requests cap to issued tokens so an operator can bound
how much of the shared Claude MAX subscription a single task may consume.

- storage: TokenRecord gains max_requests/used_requests; Lino + binary codecs
  round-trip them; TokenStore::try_consume_request enforces+increments.
- token: issue_token_full(...) and enforce_request_budget(...); new
  TokenError::LimitExceeded.
- proxy: all three handler paths (Anthropic, OpenAI, Gonka) enforce the
  budget after validation, returning HTTP 429 when exhausted.
- cli/http: --max-requests flag and max_requests JSON field; tokens list
  shows used/max.
- refactor: extract admin token endpoints into src/token_admin.rs to keep
  proxy.rs under the 1000-line limit.

Tests: budget enforced/unlimited/unknown-id cases.
… updates

- Add docs/case-studies/issue-35 (README analysis, requirements trace,
  online research, existing-components survey, redacted live evidence)
- Document nested claudeAiOauth credential layout, transparent header
  injection (anthropic-version + oauth-2025-04-20 beta), and the per-token
  request budget (max_requests) in README
- Correct stale note claiming revocations are lost on restart
- Add changelog fragment (bump: minor)
Built the image from the repo Dockerfile and ran it with a read-only copy of
the real Claude MAX credentials (original never modified). Verified nested
credential read, transparent passthrough (count_tokens HTTP 200 with only a
la_sk_ token), per-token budget 429, 401s, and no OAuth token in container
logs.
@konard konard changed the title [WIP] Test it locally Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end Jun 9, 2026
@konard konard marked this pull request as ready for review June 9, 2026 23:31
@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Working session summary

All work is complete. Here's the final summary.

Issue #35 "Test it locally" — Done

PR #36 (ready for review, all CI green): #36

What was found and fixed

  1. Root-cause bug (code): Real Claude Code writes its OAuth session to ~/.claude/.credentials.json nested under a claudeAiOauth object, but the router only read a flat accessToken — so against a real login it found no token. src/oauth.rs now reads both the nested and flat layouts, and doctor reports credential health.

  2. The requested capability — "limit how much each task can use": Added a per-token request budget. A token can carry max_requests; each forwarded request increments a persisted used_requests counter; once the cap is hit the router returns 429 rate_limit_error instead of forwarding upstream. Exposed via CLI --max-requests, the POST /api/tokens max_requests field, and a used/max column in tokens list. (Admin endpoints were split into src/token_admin.rs to stay under the 1000-line file limit.)

  3. Docs fixes: Documented the nested credential layout, transparent header injection (anthropic-version default + anthropic-beta: oauth-2025-04-20), and the budget feature; corrected the stale claim that revocations are lost on restart.

Verified end-to-end (native and Docker), against api.anthropic.com

  • Client sending only a la_sk_ token → count_tokens returns HTTP 200 ({"input_tokens":13}), proving token substitution + header injection.
  • Real OAuth token: 0 occurrences in logs/output.
  • Missing/invalid → 401, revoked → 403, budget-exhausted → our 429.
  • Original ~/.claude/.credentials.json was only copied/read — never modified or deleted (confirmed unchanged at 471 bytes).

Deliverables

  • Case study in docs/case-studies/issue-35/ (requirements trace, online research with primary sources, existing-components survey, redacted live + Docker evidence).
  • Changelog fragment (bump: minor); version intentionally not hand-edited (repo derives it from fragments).
  • Local gate green: fmt, clippy, file-size, 141 tests, doc-tests, release build — and all GitHub CI checks pass.

No open comments on the issue or PR were missed.


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $12.935831

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

  1. 115.6K / 1M (12%) input tokens, 42.7K / 128K (33%) output tokens
  2. 118.7K / 1M (12%) input tokens, 33.1K / 128K (26%) output tokens
  3. 90.1K / 1M (9%) input tokens, 28.8K / 128K (22%) output tokens

Total: (30.2K new + 321.8K cache writes + 15.4M cache reads) input tokens, 119.6K output tokens, $12.876728 cost

Claude Haiku 4.5:

  • 34.2K / 200K (17%) input tokens, 975 / 64K (2%) output tokens

Total: 34.2K input tokens, 975 output tokens, $0.059103 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Main model: Claude Opus 4.8 (claude-opus-4-8)
  • Additional models:
    • Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (6224KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard

konard commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

✅ Ready to merge

This pull request is now ready to be merged:

  • All CI checks have passed
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@konard konard merged commit da47ff6 into main Jun 9, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test it locally

1 participant