Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end by konard · Pull Request #36 · link-assistant/router

konard · 2026-06-09T22:53:00Z

Summary

Fixes #35.

This PR uses the local Docker + Claude MAX access to run the router end-to-end,
verifies that everything the docs claim actually works, fixes the one genuine
code bug found, and adds the "limit how much each task can use" capability the
issue asks for.

Three outcomes:

Fixed the real root-cause bug. Real Claude Code writes its OAuth session
to ~/.claude/.credentials.json nested under a claudeAiOauth object. The
router only read a flat accessToken, so against an actual login it found
no token. src/oauth.rs now reads both the nested and flat layouts.
Proved token hiding + transparent passthrough end-to-end against
api.anthropic.com, both natively and through the Docker image. A client
sending only a la_sk_ token (no anthropic-version, no OAuth beta header)
gets a working upstream response; the real OAuth token never appears in logs
or client-visible output.
Added a per-token request budget so a scoped token can be handed to a
separate task with a hard cap on how many upstream requests it may make.

Changes

Fixed — nested Claude MAX credential layout (`src/oauth.rs`)

extract_token() / expires_at_ms() accept both the nested claudeAiOauth
object (real Claude Code) and the flat {"accessToken": ...} layout.
doctor probes the credential file and reports found, token OK /
found, NO TOKEN / MISSING.

Added — per-token request budget (`max_requests`)

src/storage.rs: TokenRecord gains max_requests: Option<u64> and
used_requests: u64 (both #[serde(default)], backward compatible); the Lino
text codec round-trips (max_requests N) / (used_requests N);
TokenStore::try_consume_request checks-and-increments.
src/token.rs: issue_token_full(...) writes the cap;
enforce_request_budget(...) returns TokenError::LimitExceeded once hit.
src/proxy.rs: every forwarding path (Anthropic, OpenAI, Gonka) enforces the
budget after token validation and returns 429 rate_limit_error when
exhausted. Admin token endpoints were extracted to a new src/token_admin.rs
to stay under the 1000-line per-file CI limit.
src/cli.rs / src/main.rs / src/token_admin.rs: tokens issue --max-requests, the POST /api/tokens max_requests field, and a used/max
column in tokens list.

Docs

README: documented the nested credential layout, transparent header injection
(anthropic-version default + anthropic-beta: oauth-2025-04-20), and the
per-token budget; corrected the stale note claiming revocations are lost on
restart (records are persisted).
docs/case-studies/issue-35/: full case study — requirement-by-requirement
trace, online research (primary sources), existing-components survey (LiteLLM
virtual keys/budgets, Portkey, Kong AI Gateway, community Claude proxies), and
redacted live + Docker evidence.
changelog.d/20260609_233000_issue_35_local_testing.md (bump: minor).

How it was verified (live + Docker)

Performed against https://api.anthropic.com with a copy of the real Claude
MAX credentials. The original ~/.claude/.credentials.json was only read/copied
— never modified or deleted (confirmed unchanged at 471 bytes).

Check	Result
Client sends only `la_sk_` token to `count_tokens`	HTTP 200 `{"input_tokens":13}`
Real OAuth token in server / container logs	0 occurrences
Missing token	401
Invalid token	401
Revoked token	403
Capped token after its budget	our 429 `Token has reached its request limit` (no upstream `request_id`)
Usage persistence	text store `(max_requests 2) (used_requests 2)`; `tokens list` → `2/2`
Docker image (`Dockerfile`) with copied creds mounted `:ro`	identical results; nested creds read; no token leak

Evidence: docs/case-studies/issue-35/raw/ (native) and
docs/case-studies/issue-35/raw/docker/ (container).

Note: live /v1/messages inference returned an upstream 429 with a genuine
Anthropic request_id — a real account-level inference rate limit on the
shared MAX account, not a router bug. count_tokens (not inference-metered)
returning 200 through the same path proves the proxy path itself is healthy.

Tests

New unit tests in src/token.rs: test_unlimited_token_never_hits_budget,
test_request_budget_enforced (caps at 3, 4th = LimitExceeded, usage
persisted), test_budget_for_unknown_token_is_permitted.
src/storage.rs round-trip literals updated for the new fields.
src/oauth.rs tests cover nested + flat layouts.

Local CI gate (all green)

cargo fmt --check · cargo clippy --all-targets --all-features ·
file-size check (all src/*.rs < 1000 lines) · cargo test --all-features
(141 tests pass) · cargo test --doc · cargo build --release.

Version bump is intentionally not hand-edited in Cargo.toml — the repo
derives it from the changelog.d fragment (bump: minor), enforced by the
prevent_manual_version_modification policy.

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #35

…ement Add an optional max_requests cap to issued tokens so an operator can bound how much of the shared Claude MAX subscription a single task may consume. - storage: TokenRecord gains max_requests/used_requests; Lino + binary codecs round-trip them; TokenStore::try_consume_request enforces+increments. - token: issue_token_full(...) and enforce_request_budget(...); new TokenError::LimitExceeded. - proxy: all three handler paths (Anthropic, OpenAI, Gonka) enforce the budget after validation, returning HTTP 429 when exhausted. - cli/http: --max-requests flag and max_requests JSON field; tokens list shows used/max. - refactor: extract admin token endpoints into src/token_admin.rs to keep proxy.rs under the 1000-line limit. Tests: budget enforced/unlimited/unknown-id cases.

… updates - Add docs/case-studies/issue-35 (README analysis, requirements trace, online research, existing-components survey, redacted live evidence) - Document nested claudeAiOauth credential layout, transparent header injection (anthropic-version + oauth-2025-04-20 beta), and the per-token request budget (max_requests) in README - Correct stale note claiming revocations are lost on restart - Add changelog fragment (bump: minor)

Built the image from the repo Dockerfile and ran it with a read-only copy of the real Claude MAX credentials (original never modified). Verified nested credential read, transparent passthrough (count_tokens HTTP 200 with only a la_sk_ token), per-token budget 429, 401s, and no OAuth token in container logs.

konard · 2026-06-09T23:38:14Z

Working session summary

All work is complete. Here's the final summary.

Issue #35 "Test it locally" — Done

PR #36 (ready for review, all CI green): #36

What was found and fixed

Root-cause bug (code): Real Claude Code writes its OAuth session to ~/.claude/.credentials.json nested under a claudeAiOauth object, but the router only read a flat accessToken — so against a real login it found no token. src/oauth.rs now reads both the nested and flat layouts, and doctor reports credential health.
The requested capability — "limit how much each task can use": Added a per-token request budget. A token can carry max_requests; each forwarded request increments a persisted used_requests counter; once the cap is hit the router returns 429 rate_limit_error instead of forwarding upstream. Exposed via CLI --max-requests, the POST /api/tokens max_requests field, and a used/max column in tokens list. (Admin endpoints were split into src/token_admin.rs to stay under the 1000-line file limit.)
Docs fixes: Documented the nested credential layout, transparent header injection (anthropic-version default + anthropic-beta: oauth-2025-04-20), and the budget feature; corrected the stale claim that revocations are lost on restart.

Verified end-to-end (native and Docker), against `api.anthropic.com`

Client sending only a la_sk_ token → count_tokens returns HTTP 200 ({"input_tokens":13}), proving token substitution + header injection.
Real OAuth token: 0 occurrences in logs/output.
Missing/invalid → 401, revoked → 403, budget-exhausted → our 429.
Original ~/.claude/.credentials.json was only copied/read — never modified or deleted (confirmed unchanged at 471 bytes).

Deliverables

Case study in docs/case-studies/issue-35/ (requirements trace, online research with primary sources, existing-components survey, redacted live + Docker evidence).
Changelog fragment (bump: minor); version intentionally not hand-edited (repo derives it from fragments).
Local gate green: fmt, clippy, file-size, 141 tests, doc-tests, release build — and all GitHub CI checks pass.

No open comments on the issue or PR were missed.

This summary was automatically extracted from the AI working session output.

konard · 2026-06-09T23:38:22Z

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $12.935831

📊 Context and tokens usage:

Claude Opus 4.8: (3 sub-sessions)

115.6K / 1M (12%) input tokens, 42.7K / 128K (33%) output tokens
118.7K / 1M (12%) input tokens, 33.1K / 128K (26%) output tokens
90.1K / 1M (9%) input tokens, 28.8K / 128K (22%) output tokens

Total: (30.2K new + 321.8K cache writes + 15.4M cache reads) input tokens, 119.6K output tokens, $12.876728 cost

Claude Haiku 4.5:

34.2K / 200K (17%) input tokens, 975 / 64K (2%) output tokens

Total: 34.2K input tokens, 975 output tokens, $0.059103 cost

🤖 Models used:

Tool: Anthropic Claude Code
Requested: opus
Main model: Claude Opus 4.8 (claude-opus-4-8)
Additional models:
- Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (6224KB)

View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

konard · 2026-06-09T23:40:43Z

✅ Ready to merge

This pull request is now ready to be merged:

All CI checks have passed
No merge conflicts
No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

This reverts commit 0f0ad04.

Initial commit with task details

0f0ad04

Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #35

konard self-assigned this Jun 9, 2026

konard added 3 commits June 9, 2026 23:14

konard changed the title ~~[WIP] Test it locally~~ Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end Jun 9, 2026

chore(issue-35): keep e2e + docker experiment evidence and PR body

1e24d6b

konard marked this pull request as ready for review June 9, 2026 23:31

Revert "Initial commit with task details"

4fe2d7f

This reverts commit 0f0ad04.

konard merged commit da47ff6 into main Jun 9, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end#36

Test it locally: fix nested Claude MAX credential read, add per-token request budget, verify end-to-end#36
konard merged 6 commits into
mainfrom
issue-35-62a0d9107370

konard commented Jun 9, 2026 •

edited

Loading

Uh oh!

konard commented Jun 9, 2026

Uh oh!

konard commented Jun 9, 2026

Uh oh!

konard commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

konard commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Fixed — nested Claude MAX credential layout (src/oauth.rs)

Added — per-token request budget (max_requests)

Docs

How it was verified (live + Docker)

Tests

Local CI gate (all green)

Uh oh!

konard commented Jun 9, 2026

Working session summary

Issue #35 "Test it locally" — Done

What was found and fixed

Verified end-to-end (native and Docker), against api.anthropic.com

Deliverables

Uh oh!

konard commented Jun 9, 2026

🤖 Solution Draft Log

💰 Cost: $12.935831

📊 Context and tokens usage:

🤖 Models used:

📎 Log file uploaded as Gist (6224KB)

Uh oh!

konard commented Jun 9, 2026

✅ Ready to merge

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

konard commented Jun 9, 2026 •

edited

Loading

Fixed — nested Claude MAX credential layout (`src/oauth.rs`)

Added — per-token request budget (`max_requests`)

Verified end-to-end (native and Docker), against `api.anthropic.com`