perf(inference): batched causal prefill attention + elementwise batching by ohdearquant · Pull Request #189 · ohdearquant/lattice

ohdearquant · 2026-06-04T14:10:06Z

batched causal prefill attention + elementwise batching

Stacked on #188. Replaces the per-token decode-attention loop in prefill with a
single batched causal-attention pass, and batches the elementwise prefill ops so they
no longer issue a GPU dispatch per token.

Why

After the chunked-prefill fix, the remaining per-token cost in prefill was the
attention loop and the elementwise ops running once per position. This collapses both
into batched passes over the prompt.

Result

Additional prefill speedup on top of PR1 (cumulative interleaved A/B tracked
internally; a fresh same-process A/B will be attached before merge rather than quoting
a number I can't reproduce in-session). Decode path unchanged; prefill argmax parity
preserved across runs.

Notes

Removes an unnecessary unsafe in the attention fallback.
No new crates; no library unwrap().

Replaces the per-token decode-attention loop in prefill with a single batched causal attention pass, and batches the elementwise prefill ops to eliminate per-token GPU dispatches. Removes an unnecessary unsafe in the attn fallback. Decode unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

This was referenced Jun 4, 2026

perf(inference): chunked batched prefill for long prompts on Metal #188

Open

perf(inference): chunked-parallel GatedDeltaNet scan (race-fixed, gated) #190

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(inference): batched causal prefill attention + elementwise batching#189

perf(inference): batched causal prefill attention + elementwise batching#189
ohdearquant wants to merge 1 commit into
pr/prefill-1-chunkfrom
pr/prefill-2-attn

ohdearquant commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ohdearquant commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

batched causal prefill attention + elementwise batching

Why

Result

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ohdearquant commented Jun 4, 2026 •

edited

Loading