Skip to content

Add 397B compatibility#4

Open
paragon-of-brah wants to merge 2 commits intoam17an:mtp-cleanfrom
paragon-of-brah:mtp-clean-397B
Open

Add 397B compatibility#4
paragon-of-brah wants to merge 2 commits intoam17an:mtp-cleanfrom
paragon-of-brah:mtp-clean-397B

Conversation

@paragon-of-brah
Copy link
Copy Markdown

Overview

Add Qwen 3.5 397B Compatibility

Additional information

As per comment, in Qwen 3.5 397B MTP experts are stored in separate tensors:

mtp.layers.0.mlp.experts.0.down_proj.weight
mtp.layers.0.mlp.experts.0.gate_proj.weight
mtp.layers.0.mlp.experts.0.up_proj.weight

mtp.layers.1.mlp.experts.0.down_proj.weight
mtp.layers.1.mlp.experts.0.gate_proj.weight
mtp.layers.1.mlp.experts.0.up_proj.weight

etc.

So you get this error

INFO:hf-to-gguf:blk.60.attn_norm.weight,              torch.bfloat16 --> F32, shape = {4096}
Traceback (most recent call last):
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 13673, in <module>
    main()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 13667, in main
    model_instance.write()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 933, in write
    self.prepare_tensors()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 4662, in prepare_tensors
    super().prepare_tensors()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 793, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 5490, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 5446, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 4849, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 4647, in modify_tensors
    datas.append(self._experts[bid][ename])
                 ~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: 'model.layers.0.mlp.experts.0.down_proj.weight'

convert_hf_to_gguf.py needs to be changed to deal with the differences. The way i dealt with it was to imitate the way Qwen 3.5 35B packs its MTP tensors.

I have made a quant based on the shipped Qwen 3.5 397B from the qwen team using the modified convert_hf_to_gguf.py, then I used ik_llama.cpp to make a ~q5 quant and ran it with MTP enabled.

Performance is meh, but acceptance rates are ok. I have not tuned this at all, parameters are a bit random, and I'm offloading MTP exps to CPU because I have no VRAM.

      -ot "blk\.60\.ffn.*_exps.*=CPU"
      -mtp
      --recurrent-ckpt-mode per-step
      --draft-p-min 0.8
      --draft-max 6

No MTP

prompt eval time =   35886.98 ms / 14860 tokens (    2.42 ms per token,   414.08 tokens per second)
       eval time =   44691.15 ms /   404 tokens (  110.62 ms per token,     9.04 tokens per second)
      total time =   80578.14 ms / 15264 tokens

MTP

prompt eval time =   36777.28 ms / 14860 tokens (    2.47 ms per token,   404.05 tokens per second)
       eval time =   52738.47 ms /   506 tokens (  104.23 ms per token,     9.59 tokens per second)
      total time =   89515.75 ms / 15366 tokens
draft acceptance rate = 0.90966 (  292 accepted /   321 generated)
statistics mtp: #calls(b,g,a) = 1 213 146, #gen drafts = 146, #acc drafts = 140, #gen tokens = 321, #acc tokens = 292, dur(b,g,a) = 0.001, 6511.409, 0.051 ms


I HAVE NOT TESTED any other Qwen version with this convert_hf_to_gguf.py, and I don't know if those work, albeit they should.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: I used - ironically - Qwen 3.5 397B to identify the issue

@github-actions github-actions Bot added the python label May 8, 2026
@paragon-of-brah paragon-of-brah force-pushed the mtp-clean-397B branch 3 times, most recently from 1b0c3e2 to 076c6bc Compare May 8, 2026 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant