You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HPC optimization (IDLE.md clause) — blocked on hardware: the only GPU I can reach (centurion) is a production fleet node, so a multi-GB / multi-min benchmark would disrupt live serving. Needs a dedicated bench box. The known high-value target (GQA-packed quantized-KV FA kernel, ~4× headroom on the MQA deployment shape — see PR perf: D=512 quantized-KV FA vec kernels (gated gqa_ratio<=4) + server logprobs partial-sort #102 history) is a focused CUDA-kernel project, not a safe idle-cycle self-merge.
Idle-cycle maintenance sprint (per
/home/me/IDLE.md— don't hold off; keep active). One active epoch; tick items off and append newly-found work.--parallel 1). Responses API: misleading error on context overflow, must communicate token limit exceeded #19 Part-1 status:incomplete: already tested (test_responses_truncation_emits_incomplete_status).Newly found this epoch
Done this epoch (janitor)
Saved for later / needs input or dedicated hardware
llm-pool.yaml(cloud repo; Markus WIP)