feat(inference): composable ModelSpec to IR to KernelPlan with PlanFingerprint zero-cost gate (ADR-059)

Implements **ADR-059** (Composable Layer Architecture) per the d4 design: `ModelSpec → KernelGraph (IR) → fused KernelPlan tape`. Dynamic traits exist only at construction/lowering; the decode loop executes a pre-lowered backend tape with **zero `dyn` dispatch**. North-star: architecture exploration without sacrificing hand-fused Metal speed.

### Tasks
- [ ] `ModelSpec`/`AttentionSpec` (TOML/JSON) — express current Qwen3.5 hybrid as `interleave{full_attention_interval=4}`; **validation must allow head_dim 256** (not just 64/80/96/128)
- [ ] minimal kernel-graph IR (RmsNorm/Linear/RoPE/Attention/SwiGLU/ResidualAdd/QuantizeKV/SampleTopK) + epilogue-fusion pass (Linear→Bias?→Act?→Mul?→Residual?)
- [ ] `AttentionVariant` trait (validate/emit_ir/metal_kernel_key/template) — new variant adds 1 trait + 1 MSL template, **no decode-loop edits**
- [ ] template-MSL + Metal function constants (avoid the 18K-source-variant explosion)

### Acceptance — the zero-cost gate (the regression to prevent)
- composed Qwen3.5 path lowers to the **same `PlanFingerprint`** as the hand path: identical kernel list, dispatch count, command-buffer count
- decode tok/s ≥98.5% geo-mean (≥97% per ctx bucket); CI fails if plan contains `GenericLinear`/`GenericAttention`/`HostCopy`/`UnfusedResidual`
- demo: swap RoPE→NoPE / GQA→GDN as config

Ref: d4 (whole), ADR-059. Can start in parallel after the ADR-064 baseline is frozen.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): composable ModelSpec to IR to KernelPlan with PlanFingerprint zero-cost gate (ADR-059) #177

Tasks

Acceptance — the zero-cost gate (the regression to prevent)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat(inference): composable ModelSpec to IR to KernelPlan with PlanFingerprint zero-cost gate (ADR-059) #177

Description

Tasks

Acceptance — the zero-cost gate (the regression to prevent)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions