Fix Qwen3.5 MTP sanitize norm shift by xxxkkw · Pull Request #1320 · ml-explore/mlx-lm

xxxkkw · 2026-05-28T15:51:44Z

Summary

Stop using stripped mtp.* weights as a signal to shift Qwen3.5 norm weights by +1 during sanitize.
Keep the raw-checkpoint norm shift tied to unsanitized Conv1d layout conversion.
Cover both qwen3_5 and qwen3_5_moe wrapper paths so already-converted checkpoints with MTP weights can be loaded without shifting norms twice.

Environment

OS: macOS Darwin 25.4.0 arm64
Hardware: Apple M1 Max, 32 GB unified memory
Python: local project virtual environment

Testing

python -m unittest discover -s tests -p test_models.py -k qwen3_5_family_convert_then_load_norm_not_shift_twice
- Result: 1 passed

Benchmark / profiling notes

This is a sanitize/load correctness fix and does not change inference kernels.
The regression test reproduces the load-sanitize path with already-converted norm weights plus stripped MTP weights; before this change the MTP key alone triggered an extra norm shift.

Context

Full native MTP decoding is already being discussed in Native MTP speculative decoding (Qwen3.5/3.6 reference implementation) #990; this PR only extracts the small sanitize correctness fix to keep it reviewable independently.

Drop stripped MTP weights without treating their presence as a raw-checkpoint signal that shifts normalized weights during load sanitize.

Fix Qwen3.5 MTP sanitize norm shift

c437fe4

Drop stripped MTP weights without treating their presence as a raw-checkpoint signal that shifts normalized weights during load sanitize.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Qwen3.5 MTP sanitize norm shift#1320

Fix Qwen3.5 MTP sanitize norm shift#1320
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:qwen35-mtp-sanitize-norm

xxxkkw commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xxxkkw commented May 28, 2026

Summary

Environment

Testing

Benchmark / profiling notes

Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant