Skip to content

SSD KV tier: live reconfiguration (enable/disable + resize) on a resident model — control-plane drain+reattach #30

@Pushkinist

Description

@Pushkinist

Context

Issue #26 landed per-request KV-quant + max-ctx hot-swap on a resident model. The SSD-tier axis was investigated and deliberately deferred — it is not a low-risk mirror of the KV-quant override.

Why it is not a per-request override

The KV-quant override is a seed-salt on a single global PromptCache<E> (KvQuant::cache_key_salt() XOR'd into the block-hash seed in find_best_prefix). No per-request cache is built; codecs coexist as disjoint digest streams in one cache.

The SSD tier is stateful global machinery, not a lookup salt:

  • SsdHydrator (set_ssd_source) holds a fixed layout_key+kv_quant and an open SQLite index handle; seeds with FNV_OFFSET ^ self.layout_key (attach-time, not per-request).
  • SsdSpiller (set_spill_sink) is one drain thread spawned with one layout_key, writing .kvb headers with one fixed kv_quant.
  • Both captured in a single AttachParams installed once at attach_at_load.
  • The 5 Prometheus/event hooks are process-global OnceLocks.

A per-request SSD-off (when launch was on) cannot be honored by salting — the source lives on the shared global cache and must be physically detached. A per-request SSD-on with a codec differing from the resident attach codec needs a different layout_key, i.e. a different hydrator/spiller.

What's actually needed (the heavy machinery)

A control-plane drain+reattach setter on a running rmlx serve, not a per-request flag:

  1. Quiesce: block new admissions, drain in-flight decode (reuse the GPU-admission guard).
  2. Detach: drop set_ssd_source/set_spill_sink from the global PromptCache<E>; signal the SsdSpiller drain thread to flush + exit cleanly (it already drains-then-exits on a closed channel); close the SsdKvIndex handle.
  3. Reattach (or stay off): for enable/resize/codec-change, rebuild AttachParams with the new (namespace, kv_quant, layout_key, budget) and re-run prepare_attach + install_ssd_sinks (per-namespace startup maintenance: prune_missing + evict_lru_until).
  4. Resume admissions.

Integrity guards that MUST be preserved

  • Composite (hash, layout_key) PK and the layout_key salt partitioning (docs/SSD_TIER.md §layout_key Salt) — never let a reattach alias rows across codecs.
  • The cross-restart canary (docs/SSD_CANARY.md) — its single-(namespace, layout_key)-per-lifetime assumption must be revisited; add a canary variant for mid-life reattach.
  • The pre-release v1 schema wipe and per-namespace LRU budget logic in install_config (OnceLock first-call-wins — a reattach setter must NOT re-call it).
  • The single-MLX claim (unaffected — one model resident throughout).

Suggested approach

Control-plane setter, not per-request. A POST /admin/ssd (or CLI control channel) that performs the quiesce→detach→reattach sequence above against the resident model. Per-request SSD selection is explicitly out of scope — it would force teardown/reattach on the hot path, risking index/canary integrity for marginal benefit. Bench sweeps (the issue's #1 payoff) are well-served by a between-cell control-plane reattach, which matches how bench mode already flushes between cells.

Scope notes

  • Reuse existing prepare_attach / attach_ssd_tier / install_ssd_sinks for the reattach leg — no new attach path.
  • The AttachParams replay-on-rebuild mechanism (ArchPromptCache::attach) already survives capacity bumps; extend it to be settable post-load under quiesce.
  • Add a clean drain-thread shutdown signal to SsdSpiller (today it exits only on channel close / index-open failure).

Refs: crates/rmlx-models/src/prompt_cache.rs (ArchPromptCache::attach_ssd_tier, install_ssd_sinks), crates/rmlx-kv-ssd/src/{hydrate,spill,hooks}.rs, crates/rmlx-models/src/ssd_tier.rs (attach_at_load), docs/SSD_TIER.md §"Live reconfiguration (deferred — issue #26)".

Follow-up to #26.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions