fix(server): drop stale prefix-cache entries when a snapshot slot is reused by davide221 · Pull Request #371 · Luce-Org/lucebox-hub

davide221 · 2026-06-11T14:21:48Z

Problem

Follow-up to #370, fixing the root cause behind its symptom.

The inline and full-compress prefix caches hand out snapshot slots round-robin (next_slot_), and the counter advances in prepare_*_snap even when the snapshot later aborts (degenerate boundary < 512 tokens, failed generation, client disconnect). One burned step is enough: a later confirm wraps onto a slot that a live entry still references. From then on the entry table and the slot contents disagree — the entry's hash describes one token stream, the slot holds a snapshot of another.

A stale entry then misbehaves in two ways:

follow-up prompt shorter than the slot's snapshot → snapshot_longer_than_prompt, a failed request with an empty completion (the silent agent hang server: fall back to fresh prefill when a cached snapshot is longer than the prompt #370 reported; server: fall back to fresh prefill when a cached snapshot is longer than the prompt #370 downgrades it to a conservative cache miss);
follow-up prompt longer than the slot's snapshot → the restore path attaches KV from the wrong token stream with no validation: silent context corruption, which server: fall back to fresh prefill when a cached snapshot is longer than the prompt #370 does not cover.

Fix

When confirm_inline_snap / confirm_full_snap commit a snapshot into a slot, erase every other entry still pointing at that slot. A slot holds exactly one snapshot, so at most one entry may describe it. 24 lines, no API change.

Validation (RTX 3090, Qwen3.6-27B Q4_K_M, `--prefix-cache-slots 2`)

Deterministic repro: short conv (snap@3801 → slot 0) → tiny conv whose snap aborts (burns a slot step) → big distinct conv (8.2K tokens, wraps onto slot 0) → shortened follow-up of the first conv.

on main pre-server: fall back to fresh prefill when a cached snapshot is longer than the prompt #370 this sequence ends in ok=false out=0 error=snapshot_longer_than_prompt;
with this fix the wrap logs [pc] dropping stale entry for reused slot=0, the follow-up is a clean miss with correct output, and a longer same-conversation follow-up restores a valid snapshot (correct KV, correct answer) — the corruption window is closed, cache effectiveness preserved.

Greedy outputs across the whole sequence match the no-cache baseline. All 1905 server unit assertions pass.

🧙 Built with WOZCODE

The inline and full-compress prefix caches assign snapshot slots round-robin via next_slot_, which advances in prepare_*_snap even when the snapshot later aborts (degenerate boundary, failed generation, client disconnect). A burned step makes a later confirm wrap onto a slot that a live entry still references. From then on the entry table and the slot contents disagree: the entry's hash describes one token stream, the slot holds a snapshot of another. Consequences of such a stale entry: - follow-up prompt shorter than the slot snapshot: failed request (snapshot_longer_than_prompt) before PR #370, conservative cache miss after it; - follow-up prompt longer than the slot snapshot: the restore path attaches KV from the wrong token stream with no validation - silent context corruption. Fix the root cause: when confirm_inline_snap / confirm_full_snap commit a snapshot into a slot, erase every other entry still pointing at that slot. A slot holds exactly one snapshot, so at most one entry may describe it. Verified on RTX 3090 (Qwen3.6-27B Q4_K_M, --prefix-cache-slots 2) with the deterministic PR #370 repro (short conv -> aborted snap -> big conv wrapping onto slot 0 -> shortened follow-up): the wrap now logs '[pc] dropping stale entry for reused slot=0', the follow-up is a clean miss with correct output, and a longer same-conversation follow-up restores a valid snapshot. Greedy outputs across the sequence match the no-cache baseline; 1905 server unit assertions pass. Co-Authored-By: WOZCODE <contact@withwoz.com>

cubic-dev-ai

No issues found across 1 file

_{Re-trigger cubic}

cubic-dev-ai Bot reviewed Jun 11, 2026

View reviewed changes

davide221 merged commit 53ca591 into main Jun 11, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): drop stale prefix-cache entries when a snapshot slot is reused#371

fix(server): drop stale prefix-cache entries when a snapshot slot is reused#371
davide221 merged 1 commit into
mainfrom
fix/prefix-cache-stale-slot-entries

davide221 commented Jun 11, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davide221 commented Jun 11, 2026

Problem

Fix

Validation (RTX 3090, Qwen3.6-27B Q4_K_M, --prefix-cache-slots 2)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Validation (RTX 3090, Qwen3.6-27B Q4_K_M, `--prefix-cache-slots 2`)