Skip to content

Fix Qwen3.5 MTP sanitize norm shift#1320

Open
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:qwen35-mtp-sanitize-norm
Open

Fix Qwen3.5 MTP sanitize norm shift#1320
xxxkkw wants to merge 1 commit into
ml-explore:mainfrom
xxxkkw:qwen35-mtp-sanitize-norm

Conversation

@xxxkkw
Copy link
Copy Markdown

@xxxkkw xxxkkw commented May 28, 2026

Summary

  • Stop using stripped mtp.* weights as a signal to shift Qwen3.5 norm weights by +1 during sanitize.
  • Keep the raw-checkpoint norm shift tied to unsanitized Conv1d layout conversion.
  • Cover both qwen3_5 and qwen3_5_moe wrapper paths so already-converted checkpoints with MTP weights can be loaded without shifting norms twice.

Environment

  • OS: macOS Darwin 25.4.0 arm64
  • Hardware: Apple M1 Max, 32 GB unified memory
  • Python: local project virtual environment

Testing

  • python -m unittest discover -s tests -p test_models.py -k qwen3_5_family_convert_then_load_norm_not_shift_twice
    • Result: 1 passed

Benchmark / profiling notes

  • This is a sanitize/load correctness fix and does not change inference kernels.
  • The regression test reproduces the load-sanitize path with already-converted norm weights plus stripped MTP weights; before this change the MTP key alone triggered an extra norm shift.

Context

Drop stripped MTP weights without treating their presence as a raw-checkpoint signal that shifts normalized weights during load sanitize.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant