dusterbloom: feat(speculative): DraftModel trait + spec-decode primitives by dusterbloom · Pull Request #15 · dusterbloom/higgs

dusterbloom · 2026-05-04T10:18:23Z

Summary

Adds the DraftModel trait, spec-decode primitives (speculative_step, speculative_loop, accept_prefix), config plumbing (ModelConfig.draft_model / num_draft), and doctor validation for draft-model paths. Foundation for upcoming spec-decode integration.

What lands

crates/higgs-engine/src/speculative.rs (NEW, ~360 lines)
- trait DraftModel: Send — generic interface with prefill, draft, advance, rollback
- accept_prefix(draft_ids, target_ids) — longest-match prefix, returns 1..=K+1 tokens
- speculative_step / speculative_loop — full driver loop generic over dyn DraftModel
- 19 unit tests (MockDraft exercises both step and loop edge cases)
crates/higgs-models/src/lib.rs — AnyCache::trim_by(count: usize) dispatcher (depended on by spec-decode rollback). *Inherited from PR-3 (dusterbloom: feat(cache): AnyCache::trim_by dispatcher for spec-decode rollback #14), included via stack._
crates/higgs/src/config.rs — ModelConfig.draft_model: Option<String> + num_draft: usize fields, default_num_draft helper
crates/higgs/src/main.rs — --draft-model <path> and --num-draft <n> CLI flags
crates/higgs/src/doctor.rs — check_draft_models validates path resolves + warns on batch=true incompatibility (3 new tests via test_model_config(path) helper)
crates/higgs/src/daemon.rs — higgs init template documents the new flags

Senior-Rust hygiene

Library code is clippy-clean without file-level #![allow(...)]. accept_prefix uses iter().zip() lockstep iteration; bounds are enforced by the iterator types, no [i] indexing.
Test module allow is scoped to #[cfg(test)] mod tests only: #[allow(clippy::panic, clippy::unwrap_used, clippy::indexing_slicing, clippy::unreachable)]. Tests legitimately use these — failure → test fail is the intent.
Test setup uses a test_model_config(path) helper instead of duplicating 13-field literals.

Audit context

After auditing the originally-planned 7-commit bundle from feat/magic-canvas against current main, all 7 are net-new on origin/main. This PR ships 3 of 7 that are portable as-is:

Original commit	In this PR	Why
`1161b844` DraftModel trait core	✅	Pure-Rust generic, zero coupling
`054494c2` ModelConfig draft fields	✅	Trivial config additions
`feef8e47` doctor::check_draft_models	✅	Validation, no concrete drafter required
`2121953b` AneBonsaiDraftModel	❌ deferred	Requires `BonsaiQ1Engine` (PR #13)
`5108e8db` SimpleEngine wiring	❌ deferred	Requires `DiffusionEngine::load_q1` (Bonsai)
`835dc291` tokenizer hash gate	❌ deferred	Meaningful only when `5108e8d` lands
`1cee9bd6` E2E spec-decode fixes	❌ deferred	Bonsai-specific bug fixes

The deferred 4 ship in a follow-up PR once PR #13 (Bonsai-Q1) lands upstream.

Stacked on PR-3 (#14)

Includes the AnyCache::trim_by dispatcher from PR #14. Merging that first will reduce this diff.

Test plan

cargo clippy --all-targets --all-features — clean
cargo fmt --all -- --check — clean
cargo test -p higgs-engine --lib speculative -- --test-threads=1 — 19 passed
cargo test -p higgs --lib doctor -- --test-threads=1 — 39 passed

Notes

This PR adds infrastructure but no concrete drafter binding. cargo run -p higgs serve --draft-model <path> will fail at engine load with "no drafter wired" until the follow-up PR lands. The trait + plumbing alone are already useful for downstream consumers.
Co-authored with Claude Opus 4.7

Audit finding (2026-05-04): of the 3 commits originally planned for PR-3 (paged prefix cache, chunked prefill, trim_by), 2 are already on origin/main in superset form via panbanda's PR panbanda#74 squash and the 1514737 TurboQuant landing: - paged_prefix_cache.rs: main has 1072 lines (1072 vs our 855) with TqBlock + slice_axis1 + conv_pos additions on top of our work. - chunked_prefill: main wires it into a 2453-line simple.rs (vs our 1508), with compute_prefill_chunk_size + forward_chunked + the early validation our version was missing. - SteppingKeyValueCache::trim_by(usize): main has the per-layer trim with saturating_sub overflow guard (lib.rs:457). The genuinely-new piece is the AnyCache-level dispatcher: a single helper that walks every layer in either KV or Hybrid variant and trims the underlying SteppingKeyValueCache, while intentionally leaving LayerCache::Arrays (recurrent SSM state) untouched. Required by upcoming spec-decode verify-rollback paths in higgs-engine that operate on AnyCache rather than reaching into per-layer types directly. Tests: - any_cache_trim_by_kv_dispatches_to_each_layer: KV variant with mixed Some/None layers, no panic, all caches saturate to 0. - any_cache_trim_by_hybrid_skips_arrays_layers: Hybrid with KV+Arrays mix, asserts Arrays.offset stays unchanged (recurrent state cannot trim by offset alone). Suite: 332 passed, 0 failed, 24 ignored. Clippy + rustfmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Ports pure-logic speculative-decoding primitives from feat/ane-spec-decode (efa05de + prefill method from 293793a) into higgs-engine: - accept_prefix() — prefix-match acceptance logic - speculative_step() — one draft→verify→accept cycle - speculative_loop() — run until EOS/max_tokens - DraftModel trait: prefill, draft, advance, rollback (all Send) MlxDraftModel CoreML impl intentionally not ported; magic-canvas has native AneBonsaiEngine / AneArDecodeEngine / finalize_ane_gdn_inline paths instead. Bonsai shim lands separately. 19 unit tests (accept_prefix 7, MockDraft 3, step 5, loop 4). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Ports config plumbing from feat/ane-spec-decode 8e3a5b8: - ServeArgs gains --draft-model (Option<String>) and --num-draft (usize, default 8) CLI flags. - ModelConfig gains draft_model + num_draft fields, threaded through build_simple_config, load_config_file, and ensure_auto_router_model. - Test fixtures updated across config.rs, doctor.rs, tui/mod.rs. No load-path wiring yet — that arrives with the simple.rs engine port. MlxDraftModel from the source commit deliberately omitted; magic-canvas uses AneBonsaiDraftModel and native paths instead. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Ports doctor validation from feat/ane-spec-decode 2a7480d: - check_draft_models() — resolves ModelConfig.draft_model path (FAIL if not found) and warns when a draft model is paired with batch=true (speculative decoding is SimpleEngine-only). - Documents draft_model / num_draft in the daemon init template and README.md config reference. 3 new doctor tests covering: unresolvable draft path, batch=true warn, no-draft-model short-circuit. Doctor suite now 29 tests (was 26). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Cherry-picked PR-4 commits had a file-level `#![allow(clippy::unwrap_used, clippy::indexing_slicing, clippy::panic)]` on speculative.rs as a quick fix to satisfy strict workspace deny lints. That's a workaround, not how senior Rust code is written. Refactor: - `accept_prefix`: replace `for i in 0..k { target_ids[i] }` indexing with `target_ids.iter().zip(draft_ids.iter())`. Lock-step iteration, types enforce bounds, no library indexing. The K+1th token (the bonus when every draft matched) is appended via `target_ids.last()` after the loop. - File-level allow: deleted. Library code is clippy-clean. - Test module allow: scoped tightly to `#[cfg(test)] mod tests` only — `#[allow(clippy::panic, clippy::unwrap_used, clippy::indexing_slicing, clippy::unreachable)]`. Tests legitimately use these (failure → test fail, which is the intent). doctor.rs new tests: introduce `test_model_config(path)` helper instead of 13-field literals duplicated 3×. DRY test setup with struct-update syntax overrides for the field each test cares about. Verify: `cargo clippy --all-targets --all-features` clean, `cargo test -p higgs-engine --lib speculative -- --test-threads=1` 19/19, `cargo test -p higgs --lib doctor -- --test-threads=1` 39/39. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dusterbloom and others added 3 commits May 4, 2026 11:53

This was referenced May 4, 2026

feat(speculative): DraftModel trait + spec-decode primitives panbanda/higgs#144

Open

dusterbloom: feat(pld): Prompt Lookup Decoding drafter (1.84× headline) #16

Open

dusterbloom and others added 2 commits May 6, 2026 18:32

dusterbloom force-pushed the dusterbloom/draftmodel-trait branch from 48d31f9 to 96d0059 Compare May 6, 2026 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dusterbloom: feat(speculative): DraftModel trait + spec-decode primitives#15

dusterbloom: feat(speculative): DraftModel trait + spec-decode primitives#15
dusterbloom wants to merge 5 commits into
mainfrom
dusterbloom/draftmodel-trait

dusterbloom commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dusterbloom commented May 4, 2026

Summary

What lands

Senior-Rust hygiene

Audit context

Stacked on PR-3 (#14)

Test plan

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant