refactor(inference): decompose metal_qwen35.rs (15K→8 files)

## Motivation

`metal_qwen35.rs` is 15,484 lines — debugging the lm_head throughput regression (#151) required grepping through the entire file for dispatch paths, kernel definitions, and constructor logic interleaved together. This is unsustainable as we add more features (MoE, MTP, vision, grammar).

## Proposed Structure

```
crates/inference/src/forward/metal_qwen35/
├── mod.rs              ~  250 lines  (cfg gate + pub use inner::*)
├── shaders.rs          ~ 2300 lines  (MSL_SOURCE, MSL_Q4_TILED_SOURCE)
├── types.rs            ~ 1300 lines  (structs, enums, data shapes)
├── engine.rs           ~ 1100 lines  (MetalQwen35Engine::new, buffer utils, KV cache)
├── constructors.rs     ~ 2200 lines  (MetalQwen35State::new, from_q4_dir, LoRA lifecycle)
├── forward.rs          ~ 2900 lines  (encode_gdn_layer, encode_gqa_layer, generate, prefill)
├── dispatch.rs         ~  900 lines  (all dispatch_* helpers)
├── sampling.rs         ~  480 lines  (chat_completion, generate_streaming, PPL)
└── tests.rs            ~ 3200 lines  (#[cfg(test)] mod tests)
```

Wiring: `mod inner { use shaders::*; use types::*; ... }` preserves the flat namespace so existing cross-references compile without path qualification.

## Constraints

- Zero behavioral change — pure structural refactor
- `pub use inner::*` re-exports unchanged
- All existing tests pass without modification
- Feature gate `#[cfg(all(target_os = "macos", feature = "metal-gpu"))]` stays in mod.rs

## Acceptance

- [ ] `make ci` passes
- [ ] `bench_decode_ab` shows no throughput change (within noise)
- [ ] PPL golden test passes
- [ ] No public API changes (same re-exports)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(inference): decompose metal_qwen35.rs (15K→8 files) #152

Motivation

Proposed Structure

Constraints

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

refactor(inference): decompose metal_qwen35.rs (15K→8 files) #152

Description

Motivation

Proposed Structure

Constraints

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions