feat(inference): GDN snapshot-per-window speculative rollback + n-gram/MTP verify

Speculative decoding for the hybrid model. The hard part is **GDN rollback**: on rejection the d×d state is invalid. Do NOT invert the recurrence — **snapshot once per speculative window**, restore + replay accepted prefix on rejection. The 0.8B checkpoint **ships an MTP head** (`mtp_num_hidden_layers=1`) — verify usability before training anything.

### Tasks
- [ ] snapshot ring (2-3 slots): copy 18 GDN states (9 MiB f16, ~50-150µs) + KV cursor markers per window
- [ ] verify-pass: GQA batched attention over K positions + GDN **micro-chunk** recurrence (one S read/write, K-step loop in registers — NOT K separate kernels)
- [ ] reject → restore snapshot + `reset_fast()` KV + replay accepted prefix
- [ ] n-gram speculator first (zero training); then verify MTP head usability
- [ ] log acceptance distribution + replay length + snapshot cost

### Acceptance (ADR-064 gates)
- **default-on only if effective acceptance ≥0.75 (K=4) after replay accounting** — else opt-in for code/repetitive workloads
- state after spec decode == non-spec decode under greedy (S, conv, KV cursor, logits)
- p50 speedup ≥1.05, p10 ≥1.00 to enable

Ref: d3§5,§6,§7,§8. Builds on `NgramSpeculator`/`MtpVerifier`/`reset_fast()`. Highest-risk experiment; kill early if <0.72 accept.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): GDN snapshot-per-window speculative rollback + n-gram/MTP verify #176

Tasks

Acceptance (ADR-064 gates)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(inference): GDN snapshot-per-window speculative rollback + n-gram/MTP verify #176

Description

Tasks

Acceptance (ADR-064 gates)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions