Skip to content

Enable FAL prompt cache validation#301

Merged
lewtun merged 6 commits into
mainfrom
codex/fal-prompt-cache-validation
Jun 7, 2026
Merged

Enable FAL prompt cache validation#301
lewtun merged 6 commits into
mainfrom
codex/fal-prompt-cache-validation

Conversation

@lewtun
Copy link
Copy Markdown
Member

@lewtun lewtun commented Jun 7, 2026

Summary

  • Add top-level Anthropic FAL cache_control hints alongside explicit cache breakpoints
  • Capture HF Router cache_write_tokens in telemetry as cache creation tokens
  • Add focused tests for prompt-cache params and cache usage extraction

Validation

  • Live validation against the web app payload: full rendered system prompt plus 24 tool schemas
  • Claude Opus 4.8 FAL: second "hi" read 24,769 cached tokens
  • GPT-5.5 FAL: second "hi" read 11,776 cached tokens
  • uv run --extra dev pytest tests/unit/test_prompt_caching.py tests/unit/test_telemetry_usage.py
  • uv run --frozen ruff check .
  • uv run --frozen ruff format --check .

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Reviewing: Enable FAL prompt cache validation

  • Read PR diff and changed files
  • Analyze prompt_caching.py cache_control hint
  • Analyze telemetry.py cache_write_tokens extraction
  • Check test coverage
  • Post review

View job run

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 1m 21s —— View job


Review: Enable FAL prompt cache validation

No blocking issues — 2 P2. Both changes are narrowly scoped, gated to the FAL router path, and backed by live validation (24,769 cached read tokens on the second turn). The anthropic-only gating means GPT-5.5 and non-router paths are untouched.

Correctness — looks right

  • agent/core/prompt_caching.py:73-74 — the top-level cache_control hint is added unconditionally for anthropic FAL (outside the if session_id: block at :67), so it fires even on the first turn before a session id exists. Correct, since cache writes need it regardless of session id.
  • _uses_explicit_cache_control (:32-35) already requires _is_fal_router_request + anthropic/ prefix, so the GPT-5.5 branch (:70-71, :76-77) never sees cache_control. The test_prompt_cache_params_merges_gpt55_cache_hints assertion at tests/unit/test_prompt_caching.py:200-205 confirms it's absent there. Good.
  • agent/core/telemetry.py:63-69 mirrors the existing cache_read fallback pattern (:56-62) and only fills cache_creation when cache_creation_input_tokens is empty, so Anthropic-native responses are unaffected. cache_creation_tokens is event-only (cost comes from litellm.completion_cost at :118), so no double-count risk.

P2 — test gap: anthropic cache_control without a session id

The only assertion for the new cache_control hint (tests/unit/test_prompt_caching.py:185-188) passes a session_id. The unconditional first-turn path — anthropic FAL with session_id=None still producing extra_body == {"cache_control": {"type": "ephemeral"}} — is the actual cache-write trigger and isn't pinned by a test. A one-line case would lock in the behavior the PR exists to enable.

P2 — duplicate prompt_tokens_details lookup

agent/core/telemetry.py:57 and :64 both call _g("prompt_tokens_details", None). Minor; could hoist to a single local above the two if not ... blocks. Purely cosmetic.


Both findings are informational — nothing here blocks merge.

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 54s —— View job


Re-review: Enable FAL prompt cache validation

LGTM — no blocking issues. Both P2 findings from the prior review are addressed in 083d485.

  • First-turn cache_control test gap — fixed. test_prompt_cache_params_adds_anthropic_cache_control_without_session_id (tests/unit/test_prompt_caching.py:192-196) now pins the session_id=None path producing extra_body == {"cache_control": {"type": "ephemeral"}}, which is the actual cache-write trigger the PR exists to enable.
  • Duplicate prompt_tokens_details lookup — fixed. agent/core/telemetry.py:55 now hoists details = _g("prompt_tokens_details", None) into a single local consumed by both the cache_read and cache_creation fallbacks (:57-66). Behavior is identical to the prior two-call version (the details is not None guards preserve the same fallback semantics), so no regression.

No new P0 concerns. The anthropic-only gating via _uses_explicit_cache_control (prompt_caching.py:32-35, :73-74) keeps GPT-5.5 and non-router paths untouched, and cache_creation_tokens remains event-only (cost still flows through litellm.completion_cost), so no double-count risk.

Note: I couldn't run the test suite in this environment (command not permitted), but the author reported pytest tests/unit/test_prompt_caching.py tests/unit/test_telemetry_usage.py plus ruff check/format passing in the PR description.
· codex/fal-prompt-cache-validation

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 34s —— View job


Re-review: Enable FAL prompt cache validation

LGTM — no blocking issues. The new commit fef620e is a one-line copy change only.

  • frontend/src/components/UsageMeter.tsx:233 — caption reworded from "Billing window resets when you switch back to a task." to "Estimated from HF account usage during this session." Purely a UI string; no behavior change, no P0 concern.

The substantive Python changes (prompt_caching.py cache_control hint, telemetry.py cache_creation extraction) and their test coverage were already reviewed and the prior P2 findings addressed in 083d485. The anthropic-only gating via _uses_explicit_cache_control keeps GPT-5.5 and non-router paths untouched, and cache_creation_tokens remains event-only, so no double-count risk.

Nothing in this push changes that assessment.
· codex/fal-prompt-cache-validation

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Re-review: Enable FAL prompt cache validation

  • Read PR diff and changed files
  • Check the latest push (merge from main) for new concerns
  • Post review

View job run

Co-authored-by: OpenAI Codex <codex@openai.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

Claude finished @lewtun's task in 32s —— View job


Re-review: Enable FAL prompt cache validation

LGTM — no blocking issues. The latest push is a merge from main (5576cd5) plus a one-line copy change (db86503).

  • frontend/src/components/UsageMeter.tsx:233 — caption reworded from "Estimated from HF account usage during this session." to "Estimated from HF account usage per session." Purely a UI string; no behavior change, no P0 concern.
  • Merge from main (5576cd5) brings in unrelated main history only; the substantive files (agent/core/prompt_caching.py, agent/core/telemetry.py) are byte-identical to the previously reviewed state at 083d485 — confirmed via git diff origin/main...HEAD.

The prior P2 findings (first-turn cache_control test gap, duplicate prompt_tokens_details lookup) remain addressed. The anthropic-only gating via _uses_explicit_cache_control (prompt_caching.py:32-35, :73-74) keeps GPT-5.5 and non-router paths untouched, and cache_creation_tokens stays event-only (cost flows through litellm.completion_cost), so no double-count risk.

Nothing in this push changes that assessment.
· codex/fal-prompt-cache-validation

@lewtun lewtun merged commit 66cb9ac into main Jun 7, 2026
3 checks passed
@lewtun lewtun deleted the codex/fal-prompt-cache-validation branch June 7, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant