Enable FAL prompt cache validation by lewtun · Pull Request #301 · huggingface/ml-intern

lewtun · 2026-06-07T20:20:44Z

Summary

Add top-level Anthropic FAL cache_control hints alongside explicit cache breakpoints
Capture HF Router cache_write_tokens in telemetry as cache creation tokens
Add focused tests for prompt-cache params and cache usage extraction

Validation

Live validation against the web app payload: full rendered system prompt plus 24 tool schemas
Claude Opus 4.8 FAL: second "hi" read 24,769 cached tokens
GPT-5.5 FAL: second "hi" read 11,776 cached tokens
uv run --extra dev pytest tests/unit/test_prompt_caching.py tests/unit/test_telemetry_usage.py
uv run --frozen ruff check .
uv run --frozen ruff format --check .

Co-authored-by: OpenAI Codex <codex@openai.com>

github-actions · 2026-06-07T20:21:03Z

Reviewing: Enable FAL prompt cache validation

Read PR diff and changed files
Analyze prompt_caching.py cache_control hint
Analyze telemetry.py cache_write_tokens extraction
Check test coverage
Post review

View job run

github-actions · 2026-06-07T20:22:03Z

Claude finished @lewtun's task in 1m 21s —— View job

Review: Enable FAL prompt cache validation

No blocking issues — 2 P2. Both changes are narrowly scoped, gated to the FAL router path, and backed by live validation (24,769 cached read tokens on the second turn). The anthropic-only gating means GPT-5.5 and non-router paths are untouched.

Correctness — looks right

agent/core/prompt_caching.py:73-74 — the top-level cache_control hint is added unconditionally for anthropic FAL (outside the if session_id: block at :67), so it fires even on the first turn before a session id exists. Correct, since cache writes need it regardless of session id.
_uses_explicit_cache_control (:32-35) already requires _is_fal_router_request + anthropic/ prefix, so the GPT-5.5 branch (:70-71, :76-77) never sees cache_control. The test_prompt_cache_params_merges_gpt55_cache_hints assertion at tests/unit/test_prompt_caching.py:200-205 confirms it's absent there. Good.
agent/core/telemetry.py:63-69 mirrors the existing cache_read fallback pattern (:56-62) and only fills cache_creation when cache_creation_input_tokens is empty, so Anthropic-native responses are unaffected. cache_creation_tokens is event-only (cost comes from litellm.completion_cost at :118), so no double-count risk.

P2 — test gap: anthropic cache_control without a session id

The only assertion for the new cache_control hint (tests/unit/test_prompt_caching.py:185-188) passes a session_id. The unconditional first-turn path — anthropic FAL with session_id=None still producing extra_body == {"cache_control": {"type": "ephemeral"}} — is the actual cache-write trigger and isn't pinned by a test. A one-line case would lock in the behavior the PR exists to enable.

P2 — duplicate `prompt_tokens_details` lookup

agent/core/telemetry.py:57 and :64 both call _g("prompt_tokens_details", None). Minor; could hoist to a single local above the two if not ... blocks. Purely cosmetic.

Both findings are informational — nothing here blocks merge.

Co-authored-by: OpenAI Codex <codex@openai.com>

github-actions · 2026-06-07T20:26:42Z

Claude finished @lewtun's task in 54s —— View job

Re-review: Enable FAL prompt cache validation

LGTM — no blocking issues. Both P2 findings from the prior review are addressed in 083d485.

First-turn cache_control test gap — fixed. test_prompt_cache_params_adds_anthropic_cache_control_without_session_id (tests/unit/test_prompt_caching.py:192-196) now pins the session_id=None path producing extra_body == {"cache_control": {"type": "ephemeral"}}, which is the actual cache-write trigger the PR exists to enable.
Duplicate prompt_tokens_details lookup — fixed. agent/core/telemetry.py:55 now hoists details = _g("prompt_tokens_details", None) into a single local consumed by both the cache_read and cache_creation fallbacks (:57-66). Behavior is identical to the prior two-call version (the details is not None guards preserve the same fallback semantics), so no regression.

No new P0 concerns. The anthropic-only gating via _uses_explicit_cache_control (prompt_caching.py:32-35, :73-74) keeps GPT-5.5 and non-router paths untouched, and cache_creation_tokens remains event-only (cost still flows through litellm.completion_cost), so no double-count risk.

Note: I couldn't run the test suite in this environment (command not permitted), but the author reported pytest tests/unit/test_prompt_caching.py tests/unit/test_telemetry_usage.py plus ruff check/format passing in the PR description.
· codex/fal-prompt-cache-validation

Co-authored-by: OpenAI Codex <codex@openai.com>

github-actions · 2026-06-07T20:33:16Z

Claude finished @lewtun's task in 34s —— View job

Re-review: Enable FAL prompt cache validation

LGTM — no blocking issues. The new commit fef620e is a one-line copy change only.

frontend/src/components/UsageMeter.tsx:233 — caption reworded from "Billing window resets when you switch back to a task." to "Estimated from HF account usage during this session." Purely a UI string; no behavior change, no P0 concern.

The substantive Python changes (prompt_caching.py cache_control hint, telemetry.py cache_creation extraction) and their test coverage were already reviewed and the prior P2 findings addressed in 083d485. The anthropic-only gating via _uses_explicit_cache_control keeps GPT-5.5 and non-router paths untouched, and cache_creation_tokens remains event-only, so no double-count risk.

Nothing in this push changes that assessment.
· codex/fal-prompt-cache-validation

github-actions · 2026-06-07T20:34:32Z

Re-review: Enable FAL prompt cache validation

Read PR diff and changed files
Check the latest push (merge from main) for new concerns
Post review

View job run

Co-authored-by: OpenAI Codex <codex@openai.com>

github-actions · 2026-06-07T20:35:23Z

Claude finished @lewtun's task in 32s —— View job

Re-review: Enable FAL prompt cache validation

LGTM — no blocking issues. The latest push is a merge from main (5576cd5) plus a one-line copy change (db86503).

frontend/src/components/UsageMeter.tsx:233 — caption reworded from "Estimated from HF account usage during this session." to "Estimated from HF account usage per session." Purely a UI string; no behavior change, no P0 concern.
Merge from main (5576cd5) brings in unrelated main history only; the substantive files (agent/core/prompt_caching.py, agent/core/telemetry.py) are byte-identical to the previously reviewed state at 083d485 — confirmed via git diff origin/main...HEAD.

The prior P2 findings (first-turn cache_control test gap, duplicate prompt_tokens_details lookup) remain addressed. The anthropic-only gating via _uses_explicit_cache_control (prompt_caching.py:32-35, :73-74) keeps GPT-5.5 and non-router paths untouched, and cache_creation_tokens stays event-only (cost flows through litellm.completion_cost), so no double-count risk.

Nothing in this push changes that assessment.
· codex/fal-prompt-cache-validation

Enable FAL prompt cache validation

4dfd3b5

Co-authored-by: OpenAI Codex <codex@openai.com>

Merge branch 'main' into codex/fal-prompt-cache-validation

8469159

Address FAL prompt cache review

083d485

Co-authored-by: OpenAI Codex <codex@openai.com>

Clarify session usage estimate copy

fef620e

Co-authored-by: OpenAI Codex <codex@openai.com>

Merge branch 'main' into codex/fal-prompt-cache-validation

5576cd5

Update usage estimate copy

db86503

Co-authored-by: OpenAI Codex <codex@openai.com>

lewtun merged commit 66cb9ac into main Jun 7, 2026
3 checks passed

lewtun deleted the codex/fal-prompt-cache-validation branch June 7, 2026 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable FAL prompt cache validation#301

Enable FAL prompt cache validation#301
lewtun merged 6 commits into
mainfrom
codex/fal-prompt-cache-validation

lewtun commented Jun 7, 2026

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lewtun commented Jun 7, 2026

Summary

Validation

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewing: Enable FAL prompt cache validation

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: Enable FAL prompt cache validation

Correctness — looks right

P2 — test gap: anthropic cache_control without a session id

P2 — duplicate prompt_tokens_details lookup

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: Enable FAL prompt cache validation

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: Enable FAL prompt cache validation

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: Enable FAL prompt cache validation

Uh oh!

github-actions Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: Enable FAL prompt cache validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading

P2 — duplicate `prompt_tokens_details` lookup

github-actions Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading

github-actions Bot commented Jun 7, 2026 •

edited

Loading