Skip to content

[Feature] Implement eviction and capacity reclaim on CXL pool exhaustion #4

@seohui-XCENA

Description

@seohui-XCENA

Problem

When all CXL memory regions allocated to a client are full, store() fails immediately with no attempt to reclaim space — even if the pool contains stale or least-recently-used entries that could be evicted.

This means long-running inference workloads that exceed their initial allocation will start dropping KV cache entries unnecessarily, degrading serving performance.

Proposed Solution

A multi-phase eviction strategy triggered automatically when page allocation fails during store():

Phase 1: Page-level LRU eviction (within own regions)

When a client's pages are exhausted, the server selects least-recently-used KV entries from that client's owned regions as eviction victims. The client frees the corresponding pages and reuses them — all within a single
REQUEST_ALLOC round-trip (no new RPC message type).

Acceptance criteria:

  • store() succeeds transparently when evictable entries exist, instead of failing
  • LRU ordering is maintained with O(1) updates on register, lookup, and delete
  • Batch operations pre-calculate page shortage and evict in a single RPC
  • No regression in existing tests

Phase 2: Pluggable eviction policy framework

Replace the hardcoded LRU strategy with a pluggable policy interface, allowing alternative strategies (e.g., LFU, TTL-based).

Acceptance criteria:

  • Eviction policy is configurable at server startup
  • Custom policies can be implemented by extending a base class

Phase 3: Cross-client region eviction

When Phase 1 cannot free enough pages within a client's own regions, evict entire regions from other clients based on lowest KV reference count.

Acceptance criteria:

  • Cross-client eviction triggers only when intra-client eviction is insufficient
  • Affected clients are notified and clean up local state gracefully

Design considerations

  • Role separation: The server selects victims (global KV metadata visibility), clients execute local cleanup (page freeing, state removal)
  • Performance: Eviction adds latency only on the allocation failure path; normal store() is unaffected
  • Testing: A max_regions_per_instance server flag allows forcing eviction on large CXL pools where exhaustion would otherwise never occur (not for production capacity control)

Alternatives Considered

  • Client-side eviction: Each client manages its own eviction locally. Rejected because the server has global visibility of KV metadata across all clients, enabling better victim selection and avoiding redundant cross-client coordination.
  • Dedicated EVICT RPC: Add a separate eviction message type. Rejected in favor of extending REQUEST_ALLOC to keep eviction atomic with allocation in a single round-trip, avoiding race conditions between eviction and concurrent allocations.

Metadata

Metadata

Assignees

Labels

No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions