feat(batch): heterogeneous scheduler — PagedKVAllocator + GdnStateSlabAllocator (ADR-063)

Continuous batching for the hybrid model needs **two** allocators (implements part of ADR-063 serving): `PagedKVAllocator` (6 GQA layers, context-linear, quantizable pages) + `GdnStateSlabAllocator` (18 layers, fixed-size per sequence, checkpointable). Admission control must account for both.

### Tasks
- [ ] `PagedKVAllocator` (fixed pages, block tables, free queues; bytes/token = 6·2·512·dtype) + `GdnStateSlabAllocator` (1 slab/seq)
- [ ] admission: `free_kv_pages ≥ ceil((P+M)/T)` AND `free_gdn_slabs ≥ 1` AND scratch fits AND adapter-generation compatible; high-watermark (0.85) soft-reservation for interactive, hard for bench
- [ ] eviction/preemption: LRU prefix pages → pause → PreemptedRecompute (free KV+GDN, retain CPU history); never evict active KV without recompute
- [ ] decode batch mixes GQA (M=batch GEMV) with GDN (per-seq sequential recurrence)

### Acceptance (ADR-064 gates)
- allocator NEVER allocates KV for GDN layers (runtime assertion #170)
- no OOM in hard-reservation bench; serving batch tok/s no >7% regression
- quantized KV (from #118) measurably raises max concurrency

Ref: d5§7, ADR-063. Study vLLM Hybrid KV Cache Manager + PagedAttention.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch): heterogeneous scheduler — PagedKVAllocator + GdnStateSlabAllocator (ADR-063) #178

Tasks

Acceptance (ADR-064 gates)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(batch): heterogeneous scheduler — PagedKVAllocator + GdnStateSlabAllocator (ADR-063) #178

Description

Tasks

Acceptance (ADR-064 gates)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions