Speculative decoding for the hybrid model. The hard part is GDN rollback: on rejection the d×d state is invalid. Do NOT invert the recurrence — snapshot once per speculative window, restore + replay accepted prefix on rejection. The 0.8B checkpoint ships an MTP head (mtp_num_hidden_layers=1) — verify usability before training anything.
Tasks
Acceptance (ADR-064 gates)
- default-on only if effective acceptance ≥0.75 (K=4) after replay accounting — else opt-in for code/repetitive workloads
- state after spec decode == non-spec decode under greedy (S, conv, KV cursor, logits)
- p50 speedup ≥1.05, p10 ≥1.00 to enable
Ref: d3§5,§6,§7,§8. Builds on NgramSpeculator/MtpVerifier/reset_fast(). Highest-risk experiment; kill early if <0.72 accept.
Speculative decoding for the hybrid model. The hard part is GDN rollback: on rejection the d×d state is invalid. Do NOT invert the recurrence — snapshot once per speculative window, restore + replay accepted prefix on rejection. The 0.8B checkpoint ships an MTP head (
mtp_num_hidden_layers=1) — verify usability before training anything.Tasks
reset_fast()KV + replay accepted prefixAcceptance (ADR-064 gates)
Ref: d3§5,§6,§7,§8. Builds on
NgramSpeculator/MtpVerifier/reset_fast(). Highest-risk experiment; kill early if <0.72 accept.