Fix lumina2 pad token shape mismatch for some GGUF conversions#392
Fix lumina2 pad token shape mismatch for some GGUF conversions#392vaclavmuller wants to merge 1 commit into
Conversation
|
thank you! this fixed the issue I was having |
|
I'd also like this merged. |
|
Thank you! That solved my problem. |
|
@city96 Is there anything else you need done here? |
|
I don't pretend to understand all of what is going on here, but I've found that this only causes an error if these two layers are in BF16; if they are F32 or F16 they seem to work fine. Not sure why that is. Or, I may be experiencing an unrelated issue. |
|
this should be on the main. |
|
While this fixes loading for zimage turbo, I am getting similar errors for zimage base. z_image_base_Q8_0.gguf: z_image_base_BF16.gguf |
我也是这个问题 |
|
Thanks for testing and for reporting this. The fix in this PR only addresses the pad token shape mismatch that occurs with some Z-Image Turbo GGUF conversions ( These errors: Q8_0: BF16: suggest that the tensor shapes stored in the GGUF file do not match what the current NextDiT / lumina2 implementation in ComfyUI expects. That is most likely related either to:
In other words, this is probably not the same issue that this PR fixes. For transparency: I am not a core developer of this repository. I ran into the Turbo issue while using ComfyUI and worked through the debugging process with the help of ChatGPT, which helped me understand the loader code and produce a minimal fix for that specific problem. If you want to investigate the Z-Image Base issue further, I would recommend:
If you are not familiar with the loader code, tools like ChatGPT can actually be very helpful for exploring the code and reasoning about shape mismatches like this. That’s essentially how I approached the Turbo issue as well. If someone can identify which specific tensor is triggering the mismatch (the stack trace usually shows it), that would be a good starting point to determine whether this needs:
|
The errors/bugs fixed
1. NextDiT / Lumina Pad Token Shape Mismatch
Python
RuntimeError: Error(s) in loading state_dict for NextDiT:
size mismatch for x_pad_token: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([1, 3840]).
size mismatch for cap_pad_token: copying a param with shape torch.Size([3840]) from checkpoint, the shape in current model is torch.Size([1, 3840]).
The Cause: A recent update to the ComfyUI core changed the expected shape of NextDiT/Lumina pad tokens from 1D [3840] to 2D [1, 3840].
2. BF16 Raw Byte Mismatch
Python
RuntimeError: Error(s) in loading state_dict for NextDiT:
While copying the parameter named "x_pad_token", whose dimensions in the model are torch.Size([1, 3840]) and whose dimensions in the checkpoint are torch.Size([7680]), an exception occurred : ('The size of tensor a (3840) must match the size of tensor b (7680) at non-singleton dimension 1',)
The Cause: Models packed with uncompressed bfloat16 tokens load as raw bytes (3840×2=7680 bytes). Because the PR city96#392 patch forced these tokens to 2D before the BF16 dequantization step, it bypassed the len(shape) <= 1 safety check. The uncompressed bytes were passed directly to ComfyUI without being converted back to floats.
🛠️ The Fix
Modified loader.py to ensure that Lumina pad tokens (x_pad_token, cap_pad_token) are properly evaluated by the bfloat16 dequantization check, even when they are forced into a 2D shape.
Replaced the lumina2 shape logic in loader.py with:
Python
is_lumina_pad = (arch_str == "lumina2" and sd_key in ("x_pad_token", "cap_pad_token"))
if is_lumina_pad:
if len(shape) == 1:
shape = torch.Size((1, shape[0]))
# add to state dict
if tensor.tensor_type in {gguf.GGMLQuantizationType.F32, gguf.GGMLQuantizationType.F16}:
torch_tensor = torch_tensor.view(*shape)
state_dict[sd_key] = GGMLTensor(torch_tensor, tensor_type=tensor.tensor_type, tensor_shape=shape)
# 1D tensors shouldn't be quantized, this is a fix for BF16
# Force the fix to run on lumina pad tokens as well
if (len(shape) <= 1 or is_lumina_pad) and tensor.tensor_type == gguf.GGMLQuantizationType.BF16:
state_dict[sd_key] = dequantize_tensor(state_dict[sd_key], dtype=torch.float32)
This ensures that the 7680 bytes are converted back into 3840 true float values before being handed over to ComfyUI's new [1, 3840] structure.
This PR fixes a shape mismatch when loading some lumina2 / NextDiT GGUF models
(e.g. Z-Image Turbo GGUF builds).
Some GGUF conversions store
x_pad_tokenandcap_pad_tokenas 1D vectors([D]) instead of the expected 2D shape ([1, D]), which causes
load_state_dictto fail.The loader now:
orig_shapemetadata is missingTested with:
https://huggingface.co/leejet/Z-Image-Turbo-GGUF
Addresses #379