Context
Issue #26 landed per-request KV-quant + max-ctx hot-swap on a resident model. The SSD-tier axis was investigated and deliberately deferred — it is not a low-risk mirror of the KV-quant override.
Why it is not a per-request override
The KV-quant override is a seed-salt on a single global PromptCache<E> (KvQuant::cache_key_salt() XOR'd into the block-hash seed in find_best_prefix). No per-request cache is built; codecs coexist as disjoint digest streams in one cache.
The SSD tier is stateful global machinery, not a lookup salt:
SsdHydrator (set_ssd_source) holds a fixed layout_key+kv_quant and an open SQLite index handle; seeds with FNV_OFFSET ^ self.layout_key (attach-time, not per-request).
SsdSpiller (set_spill_sink) is one drain thread spawned with one layout_key, writing .kvb headers with one fixed kv_quant.
- Both captured in a single
AttachParams installed once at attach_at_load.
- The 5 Prometheus/event hooks are process-global
OnceLocks.
A per-request SSD-off (when launch was on) cannot be honored by salting — the source lives on the shared global cache and must be physically detached. A per-request SSD-on with a codec differing from the resident attach codec needs a different layout_key, i.e. a different hydrator/spiller.
What's actually needed (the heavy machinery)
A control-plane drain+reattach setter on a running rmlx serve, not a per-request flag:
- Quiesce: block new admissions, drain in-flight decode (reuse the GPU-admission guard).
- Detach: drop
set_ssd_source/set_spill_sink from the global PromptCache<E>; signal the SsdSpiller drain thread to flush + exit cleanly (it already drains-then-exits on a closed channel); close the SsdKvIndex handle.
- Reattach (or stay off): for enable/resize/codec-change, rebuild
AttachParams with the new (namespace, kv_quant, layout_key, budget) and re-run prepare_attach + install_ssd_sinks (per-namespace startup maintenance: prune_missing + evict_lru_until).
- Resume admissions.
Integrity guards that MUST be preserved
- Composite
(hash, layout_key) PK and the layout_key salt partitioning (docs/SSD_TIER.md §layout_key Salt) — never let a reattach alias rows across codecs.
- The cross-restart canary (
docs/SSD_CANARY.md) — its single-(namespace, layout_key)-per-lifetime assumption must be revisited; add a canary variant for mid-life reattach.
- The pre-release v1 schema wipe and per-namespace LRU budget logic in
install_config (OnceLock first-call-wins — a reattach setter must NOT re-call it).
- The single-MLX claim (unaffected — one model resident throughout).
Suggested approach
Control-plane setter, not per-request. A POST /admin/ssd (or CLI control channel) that performs the quiesce→detach→reattach sequence above against the resident model. Per-request SSD selection is explicitly out of scope — it would force teardown/reattach on the hot path, risking index/canary integrity for marginal benefit. Bench sweeps (the issue's #1 payoff) are well-served by a between-cell control-plane reattach, which matches how bench mode already flushes between cells.
Scope notes
- Reuse existing
prepare_attach / attach_ssd_tier / install_ssd_sinks for the reattach leg — no new attach path.
- The
AttachParams replay-on-rebuild mechanism (ArchPromptCache::attach) already survives capacity bumps; extend it to be settable post-load under quiesce.
- Add a clean drain-thread shutdown signal to
SsdSpiller (today it exits only on channel close / index-open failure).
Refs: crates/rmlx-models/src/prompt_cache.rs (ArchPromptCache::attach_ssd_tier, install_ssd_sinks), crates/rmlx-kv-ssd/src/{hydrate,spill,hooks}.rs, crates/rmlx-models/src/ssd_tier.rs (attach_at_load), docs/SSD_TIER.md §"Live reconfiguration (deferred — issue #26)".
Follow-up to #26.
Context
Issue #26 landed per-request KV-quant + max-ctx hot-swap on a resident model. The SSD-tier axis was investigated and deliberately deferred — it is not a low-risk mirror of the KV-quant override.
Why it is not a per-request override
The KV-quant override is a seed-salt on a single global
PromptCache<E>(KvQuant::cache_key_salt()XOR'd into the block-hash seed infind_best_prefix). No per-request cache is built; codecs coexist as disjoint digest streams in one cache.The SSD tier is stateful global machinery, not a lookup salt:
SsdHydrator(set_ssd_source) holds a fixedlayout_key+kv_quantand an open SQLite index handle; seeds withFNV_OFFSET ^ self.layout_key(attach-time, not per-request).SsdSpiller(set_spill_sink) is one drain thread spawned with onelayout_key, writing.kvbheaders with one fixedkv_quant.AttachParamsinstalled once atattach_at_load.OnceLocks.A per-request SSD-off (when launch was on) cannot be honored by salting — the source lives on the shared global cache and must be physically detached. A per-request SSD-on with a codec differing from the resident attach codec needs a different
layout_key, i.e. a different hydrator/spiller.What's actually needed (the heavy machinery)
A control-plane drain+reattach setter on a running
rmlx serve, not a per-request flag:set_ssd_source/set_spill_sinkfrom the globalPromptCache<E>; signal theSsdSpillerdrain thread to flush + exit cleanly (it already drains-then-exits on a closed channel); close theSsdKvIndexhandle.AttachParamswith the new(namespace, kv_quant, layout_key, budget)and re-runprepare_attach+install_ssd_sinks(per-namespace startup maintenance:prune_missing+evict_lru_until).Integrity guards that MUST be preserved
(hash, layout_key)PK and thelayout_keysalt partitioning (docs/SSD_TIER.md§layout_key Salt) — never let a reattach alias rows across codecs.docs/SSD_CANARY.md) — its single-(namespace, layout_key)-per-lifetime assumption must be revisited; add a canary variant for mid-life reattach.install_config(OnceLockfirst-call-wins — a reattach setter must NOT re-call it).Suggested approach
Control-plane setter, not per-request. A
POST /admin/ssd(or CLI control channel) that performs the quiesce→detach→reattach sequence above against the resident model. Per-request SSD selection is explicitly out of scope — it would force teardown/reattach on the hot path, risking index/canary integrity for marginal benefit. Bench sweeps (the issue's #1 payoff) are well-served by a between-cell control-plane reattach, which matches how bench mode already flushes between cells.Scope notes
prepare_attach/attach_ssd_tier/install_ssd_sinksfor the reattach leg — no new attach path.AttachParamsreplay-on-rebuild mechanism (ArchPromptCache::attach) already survives capacity bumps; extend it to be settable post-load under quiesce.SsdSpiller(today it exits only on channel close / index-open failure).Refs:
crates/rmlx-models/src/prompt_cache.rs(ArchPromptCache::attach_ssd_tier,install_ssd_sinks),crates/rmlx-kv-ssd/src/{hydrate,spill,hooks}.rs,crates/rmlx-models/src/ssd_tier.rs(attach_at_load),docs/SSD_TIER.md§"Live reconfiguration (deferred — issue #26)".Follow-up to #26.