feat(qwen3_next): mixed-bit Qwen3.5 GDN BA loading fallback#148
Conversation
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughQwen3.5 MoE/VLM loading now detects per-layer mixed-bit BA quantization for GDN projections during config parsing and uses a runtime path that tries fused projection loading first, then falls back to separate projections when BA fusion is invalid due to mismatched shapes or quantization. ChangesGDN Mixed-Bit Quantization Handling
Sequence DiagramsequenceDiagram
participant Config as Config Parser
participant Runtime as Runtime Loader
participant FusedLoader as Fused GDN Loader
participant DirectLoader as Direct Loader
participant Fallback as Fallback Handler
Config->>Config: Scan per-layer quantization overrides
Config->>Runtime: Set use_separate_gdn_projections flag
Runtime->>Runtime: Load model config
alt use_separate_gdn_projections forced
Runtime->>DirectLoader: Load with separate projections
DirectLoader-->>Runtime: Weights loaded
else Attempt fused path
Runtime->>FusedLoader: Attempt fused GDN loading
FusedLoader->>FusedLoader: Check can_concatenate_axis0(in_proj_a, in_proj_b)
alt Shapes compatible
FusedLoader->>FusedLoader: Fuse projections
FusedLoader-->>Runtime: Weights loaded (fused)
else Shapes incompatible or mixed-bit
FusedLoader-->>Fallback: Return ShapeMismatch / fusion error
Fallback->>Runtime: Rebuild config with separate_gdn=true
Runtime->>DirectLoader: Reload via direct loader
DirectLoader-->>Runtime: Weights loaded (separate)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/higgs-models/src/qwen3_next.rs`:
- Around line 3731-3736: The code currently unconditionally computes
use_separate and inserts "use_separate_gdn_projections" into map, overwriting
any config-provided value; change this to preserve an existing config entry:
before calling map.insert for the key "use_separate_gdn_projections" (around the
variable use_separate and map.insert), check whether map already contains that
key or whether map.get("use_separate_gdn_projections") is
Some(serde_json::Value::Bool(_)); only insert the computed
serde_json::Value::from(use_separate) when the key is absent (or not a boolean),
so the config-provided true/false is not clobbered by the env/mixed-bit
detection logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d5bd8595-b442-4013-9fb3-29e3c1e8d6e1
📒 Files selected for processing (1)
crates/higgs-models/src/qwen3_next.rs
Adds a fallback path for loading Qwen3.5 models with mixed-bit GDN
projection weights (some layers q4, some q8 — common in unsloth's
dynamic-quant variants). The default fused-projection loader fuses
`in_proj_a` + `in_proj_b` into a single matmul; mixed-bit weights
have incompatible shapes and the fusion fails.
Behaviour:
1. Detect via `is_mixed_bit_gdn_ba_fusion_error` — matches a
`ModelError::ShapeMismatch` whose message contains both
`in_proj_ba` and `requires separate GDN projections`.
2. On detection, retry the load with
`args.use_separate_gdn_projections = true`, taking the
`load_qwen3_5_moe_weights_direct` path. Forward dispatches go from
2 to 4 GDN ops per layer — slightly slower but correct.
3. Forced separate (via `args.use_separate_gdn_projections` config or
`HIGGS_SEPARATE_GDN_PROJ` env var) skips the fused attempt
entirely.
Also adds:
* `qwen3_5_quantization_config` — parses `{group_size, bits}` from
the per-layer `quantization` map in `config.json`.
* `qwen3_5_mixed_ba_quantization_layers` — scans for the layers
where `in_proj_a` and `in_proj_b` differ in bits or group_size.
* `can_concatenate_axis0` — guard used inside
`load_qwen3_5_moe_weights_fused` to emit the diagnostic
`ShapeMismatch` error rather than panicking on the concat.
* `load_qwen3_5_model_with_gdn_fallback` — private helper called by
both `load_qwen3_5_model` (dense) and `load_qwen3_5_moe_model`
(MoE), unifying the fallback path.
Adaptations from feat/magic-canvas → origin/main:
* The dense `load_qwen3_5_model` previously only honoured the env
var; now it honours `args.use_separate_gdn_projections` too,
matching the MoE path. Strict improvement: the config flag is set
only by the env var or by mixed-bit detection.
* No `unwrap()`, no `as` casts (use `i32::try_from`); match arms
enumerate variants. No file-level allows added.
Verification on origin/main (rustc 1.95.0):
* `cargo check -p higgs-models` — clean
* `cargo clippy --all-targets --all-features -- -D warnings` — clean
* `cargo fmt --check` — clean
* `cargo test -p higgs-models --lib` — 333/333 pass (3 new)
Source: feat/magic-canvas commit `061e500c`. Direct cherry-pick had 5
conflict regions because origin/main has evolved the load functions
independently; this is a manual surgical port that preserves
origin/main's structure while adding the fallback behaviour.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7497649 to
358ac62
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/higgs-models/src/qwen3_next.rs`:
- Around line 3787-3797: The current layer-scan in the closure only looks up
quantization keys with the "language_model.model.layers.{layer_idx}.linear_attn"
prefix and therefore misses checkpoints that use the unprefixed
"model.layers.{layer_idx}.linear_attn" layout; update the closure (the filter
over (0..num_hidden_layers) that computes a_quant and b_quant) to try both key
forms when calling quant.get(...).and_then(qwen3_5_quantization_config) (i.e.,
check quant.get(prefixed_key) first and fall back to quant.get(unprefixed_key)
before defaulting to default_quant.clone()), mirroring the behavior already used
by gate_quantization_override so mixed-BA detection runs for both prefixed and
unprefixed checkpoints.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5fe2c153-be5c-448d-bb7c-f4c105efe65d
📒 Files selected for processing (1)
crates/higgs-models/src/qwen3_next.rs
358ac62 to
45ece14
Compare
45ece14 to
dd9cf7a
Compare
Summary
Adds a load-time fallback for Qwen3.5 models with mixed-bit GDN BA projections (some layers q4, some q8 — common in unsloth's dynamic-quant variants). The default fused loader concatenates
in_proj_a+in_proj_binto a single matmul; mixed-bit weights have incompatible shapes and fusion fails. This PR detects that specific failure and retries with separate (4-dispatch) GDN projections.PR-1c of the
feat/magic-canvassplit — a deferred follow-up from #141's audit (source commit061e500c).Behaviour
load_qwen3_5_moe_weights_fused(2 GDN dispatches per layer).ModelError::ShapeMismatchcontaining both"in_proj_ba"and"requires separate GDN projections", retry withargs.use_separate_gdn_projections = trueandload_qwen3_5_moe_weights_direct(4 dispatches per layer).HIGGS_SEPARATE_GDN_PROJenv orargs.use_separate_gdn_projections) skips the fused attempt entirely.What's added
is_mixed_bit_gdn_ba_fusion_error— matches the diagnostic shape-mismatch errorqwen3_5_quantization_config— parses{group_size, bits}from the per-layerquantizationmapqwen3_5_mixed_ba_quantization_layers— finds layers wherein_proj_aandin_proj_bdiffercan_concatenate_axis0— guard insideload_qwen3_5_moe_weights_fusedto emit the diagnostic error rather than panicload_qwen3_5_model_with_gdn_fallback— private helper called by both dense and MoE load pathsAdaptations to origin/main
load_qwen3_5_modelon origin/main only honoured the env var; now also honoursargs.use_separate_gdn_projections(matching MoE behaviour). Strict improvement — flag is set only by env or by mixed-bit detection.061e500cproduced 5 conflict regions because origin/main has evolved the load functions independently. This is a manual surgical port preserving origin/main's structure.Hygiene
unwrap()onResult/Optionascasts in shape-arithmetic paths (i32::try_from)Test plan
cargo check -p higgs-models— cleancargo clippy --all-targets --all-features -- -D warnings— clean (rustc 1.95.0, matches CI)cargo fmt --check— cleancargo test -p higgs-models --lib— 333/333 pass (3 new)Brooooooklyn/Qwen3.5-27B-unsloth-mlx) — confirm the warn-then-retry path triggers and the model loads.Context
Part of the
feat/magic-canvasPR split. Prior PRs in the chain: #141 (fused MoE), #142 (Bonsai-Q1 fp16 +2.83×), #143 (AnyCache::trim_by), #144 (DraftModel trait), #145 (PLD drafter), #147 (DFlash drafter foundation).🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests