Add 397B compatibility by paragon-of-brah · Pull Request #4 · am17an/llama.cpp

paragon-of-brah · 2026-05-08T22:54:45Z

Overview

Add Qwen 3.5 397B Compatibility

Additional information

As per comment, in Qwen 3.5 397B MTP experts are stored in separate tensors:

mtp.layers.0.mlp.experts.0.down_proj.weight
mtp.layers.0.mlp.experts.0.gate_proj.weight
mtp.layers.0.mlp.experts.0.up_proj.weight

mtp.layers.1.mlp.experts.0.down_proj.weight
mtp.layers.1.mlp.experts.0.gate_proj.weight
mtp.layers.1.mlp.experts.0.up_proj.weight

etc.

So you get this error

INFO:hf-to-gguf:blk.60.attn_norm.weight,              torch.bfloat16 --> F32, shape = {4096}
Traceback (most recent call last):
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 13673, in <module>
    main()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 13667, in main
    model_instance.write()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 933, in write
    self.prepare_tensors()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 4662, in prepare_tensors
    super().prepare_tensors()
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 793, in prepare_tensors
    for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 5490, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 5446, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 4849, in modify_tensors
    yield from super().modify_tensors(data_torch, name, bid)
  File "/home/user/AI/llama.cpp/convert_hf_to_gguf.py", line 4647, in modify_tensors
    datas.append(self._experts[bid][ename])
                 ~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: 'model.layers.0.mlp.experts.0.down_proj.weight'

convert_hf_to_gguf.py needs to be changed to deal with the differences. The way i dealt with it was to imitate the way Qwen 3.5 35B packs its MTP tensors.

I have made a quant based on the shipped Qwen 3.5 397B from the qwen team using the modified convert_hf_to_gguf.py, then I used ik_llama.cpp to make a ~q5 quant and ran it with MTP enabled.

Performance is meh, but acceptance rates are ok. I have not tuned this at all, parameters are a bit random, and I'm offloading MTP exps to CPU because I have no VRAM.

      -ot "blk\.60\.ffn.*_exps.*=CPU"
      -mtp
      --recurrent-ckpt-mode per-step
      --draft-p-min 0.8
      --draft-max 6

No MTP

prompt eval time =   35886.98 ms / 14860 tokens (    2.42 ms per token,   414.08 tokens per second)
       eval time =   44691.15 ms /   404 tokens (  110.62 ms per token,     9.04 tokens per second)
      total time =   80578.14 ms / 15264 tokens

MTP

prompt eval time =   36777.28 ms / 14860 tokens (    2.47 ms per token,   404.05 tokens per second)
       eval time =   52738.47 ms /   506 tokens (  104.23 ms per token,     9.59 tokens per second)
      total time =   89515.75 ms / 15366 tokens
draft acceptance rate = 0.90966 (  292 accepted /   321 generated)
statistics mtp: #calls(b,g,a) = 1 213 146, #gen drafts = 146, #acc drafts = 140, #gen tokens = 321, #acc tokens = 292, dur(b,g,a) = 0.001, 6511.409, 0.051 ms

I HAVE NOT TESTED any other Qwen version with this convert_hf_to_gguf.py, and I don't know if those work, albeit they should.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: I used - ironically - Qwen 3.5 397B to identify the issue

Add 397B compatibility

0d52b69

github-actions Bot added the python label May 8, 2026

paragon-of-brah force-pushed the mtp-clean-397B branch 3 times, most recently from 1b0c3e2 to 076c6bc Compare May 8, 2026 23:40

fix to support all Qwen 3.5/3.6 with fused or not fused experts

ca57840

paragon-of-brah force-pushed the mtp-clean-397B branch from 076c6bc to ca57840 Compare May 8, 2026 23:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 397B compatibility#4

Add 397B compatibility#4
paragon-of-brah wants to merge 2 commits intoam17an:mtp-cleanfrom
paragon-of-brah:mtp-clean-397B

paragon-of-brah commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paragon-of-brah commented May 8, 2026

Overview

Additional information

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant