Problem
When all CXL memory regions allocated to a client are full, store() fails immediately with no attempt to reclaim space — even if the pool contains stale or least-recently-used entries that could be evicted.
This means long-running inference workloads that exceed their initial allocation will start dropping KV cache entries unnecessarily, degrading serving performance.
Proposed Solution
A multi-phase eviction strategy triggered automatically when page allocation fails during store():
Phase 1: Page-level LRU eviction (within own regions)
When a client's pages are exhausted, the server selects least-recently-used KV entries from that client's owned regions as eviction victims. The client frees the corresponding pages and reuses them — all within a single
REQUEST_ALLOC round-trip (no new RPC message type).
Acceptance criteria:
store() succeeds transparently when evictable entries exist, instead of failing
- LRU ordering is maintained with O(1) updates on
register, lookup, and delete
- Batch operations pre-calculate page shortage and evict in a single RPC
- No regression in existing tests
Phase 2: Pluggable eviction policy framework
Replace the hardcoded LRU strategy with a pluggable policy interface, allowing alternative strategies (e.g., LFU, TTL-based).
Acceptance criteria:
- Eviction policy is configurable at server startup
- Custom policies can be implemented by extending a base class
Phase 3: Cross-client region eviction
When Phase 1 cannot free enough pages within a client's own regions, evict entire regions from other clients based on lowest KV reference count.
Acceptance criteria:
- Cross-client eviction triggers only when intra-client eviction is insufficient
- Affected clients are notified and clean up local state gracefully
Design considerations
- Role separation: The server selects victims (global KV metadata visibility), clients execute local cleanup (page freeing, state removal)
- Performance: Eviction adds latency only on the allocation failure path; normal
store() is unaffected
- Testing: A
max_regions_per_instance server flag allows forcing eviction on large CXL pools where exhaustion would otherwise never occur (not for production capacity control)
Alternatives Considered
- Client-side eviction: Each client manages its own eviction locally. Rejected because the server has global visibility of KV metadata across all clients, enabling better victim selection and avoiding redundant cross-client coordination.
- Dedicated
EVICT RPC: Add a separate eviction message type. Rejected in favor of extending REQUEST_ALLOC to keep eviction atomic with allocation in a single round-trip, avoiding race conditions between eviction and concurrent allocations.
Problem
When all CXL memory regions allocated to a client are full,
store()fails immediately with no attempt to reclaim space — even if the pool contains stale or least-recently-used entries that could be evicted.This means long-running inference workloads that exceed their initial allocation will start dropping KV cache entries unnecessarily, degrading serving performance.
Proposed Solution
A multi-phase eviction strategy triggered automatically when page allocation fails during
store():Phase 1: Page-level LRU eviction (within own regions)
When a client's pages are exhausted, the server selects least-recently-used KV entries from that client's owned regions as eviction victims. The client frees the corresponding pages and reuses them — all within a single
REQUEST_ALLOCround-trip (no new RPC message type).Acceptance criteria:
store()succeeds transparently when evictable entries exist, instead of failingregister,lookup, anddeletePhase 2: Pluggable eviction policy framework
Replace the hardcoded LRU strategy with a pluggable policy interface, allowing alternative strategies (e.g., LFU, TTL-based).
Acceptance criteria:
Phase 3: Cross-client region eviction
When Phase 1 cannot free enough pages within a client's own regions, evict entire regions from other clients based on lowest KV reference count.
Acceptance criteria:
Design considerations
store()is unaffectedmax_regions_per_instanceserver flag allows forcing eviction on large CXL pools where exhaustion would otherwise never occur (not for production capacity control)Alternatives Considered
EVICTRPC: Add a separate eviction message type. Rejected in favor of extendingREQUEST_ALLOCto keep eviction atomic with allocation in a single round-trip, avoiding race conditions between eviction and concurrent allocations.