Skip to content

Fix FP8 tensor support on MPS backend for Apple Silicon Macs#23

Open
AoiYamada wants to merge 1 commit intoComfy-Org:mainfrom
AoiYamada:fix/convert-float8-e4m3fn-to-the-mps-backend-is-unsupported-error
Open

Fix FP8 tensor support on MPS backend for Apple Silicon Macs#23
AoiYamada wants to merge 1 commit intoComfy-Org:mainfrom
AoiYamada:fix/convert-float8-e4m3fn-to-the-mps-backend-is-unsupported-error

Conversation

@AoiYamada
Copy link
Copy Markdown

Problem:
Users on Apple Silicon Macs (MPS backend) encounter TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype when running FP8-quantized models through the sampler. This occurs because MPS does not natively support FP8 data type conversions.

Related Issues:

(all report similar FP8/MPS compatibility issues)

Previous Attempts & Limitations:

  1. Force CPU execution (--cpu flag): Makes generation prohibitively slow
  2. GGUF format: Same performance issue as CPU execution
  3. Model conversion to FP16: Even after converting all WAN model to FP16 and using FP16 vae, text encoder, the sampler still produces FP8 tensors internally

The conversion script:

from safetensors.torch import load_file, save_file
import torch
import sys

if len(sys.argv) < 2:
    print("Usage: python3 convert-fp16.py <input_safetensors_file>")
    sys.exit(1)

path = sys.argv[1] # input file

def convert_safetensors_to_fp16(input_path):
    state_dict = load_file(input_path)
    fp16_state_dict = {}
    for key, value in state_dict.items():
        if value.dtype == torch.float8_e4m3fn:
            fp16_state_dict[key] = value.to(torch.float16)
        else:
            fp16_state_dict[key] = value
    
    # Save the new file
    output_path = path.replace(".safetensors", "") + "-fp16.safetensors"
    save_file(fp16_state_dict, output_path, metadata={"format": "pt"})
    print(f"Converted file saved to {output_path}")

convert_safetensors_to_fp16(path)

Root Cause:
The comfy_kitchen quantization library attempts direct FP8 tensor operations on MPS, which lacks FP8 support. The error occurs in dequantize_per_tensor_fp8() when trying to convert FP8 tensors to other formats on MPS.

Solution:
Add a device-aware fallback in dequantize_per_tensor_fp8():

  • Detect FP8 tensors on MPS devices
  • Temporarily move tensors to CPU for FP8 conversion (which CPU supports)
  • Return results to MPS for continued processing

This minimal fix enables FP8-quantized models to work on Apple Silicon while:

  • Maintaining MPS acceleration for the majority of operations
  • Avoiding performance penalties of full CPU execution

@AoiYamada
Copy link
Copy Markdown
Author

After modifying this code in my comfy ui, I could generate a 544*720, 480p, 5s video within 25 mins by wan fp8(converted to fp16) model, which took an hour and was still not finished before. Hope this helps other people who're struggling with this problem.

@Owen1226
Copy link
Copy Markdown

After modifying this code in my comfy ui, I could generate a 544*720, 480p, 5s video within 25 mins by wan fp8(converted to fp16) model, which took an hour and was still not finished before. Hope this helps other people who're struggling with this problem.

Hello, I tried your updates. It worked well at first, but it got the same error after the process bar reached 100%.

@AoiYamada
Copy link
Copy Markdown
Author

AoiYamada commented Jan 20, 2026

Hello, I tried your updates. It worked well at first, but it got the same error after the process bar reached 100%.

Could you paste the error logs here? maybe there is another problem

I tested in my macbook m4 pro max and successfully output the video, here is the workflow I used (from civiai)
DaSiWa WAN 2.2 i2v FastFidelity C-AiO-33.json

The result:

162534_00001.webm

My comfyui info:
ComfyUI v0.7.2

@Owen1226
Copy link
Copy Markdown

Hello, I tried your updates. It worked well at first, but it got the same error after the process bar reached 100%.

Could you paste the error logs here? maybe there is another problem

I tested in my macbook m4 pro max and successfully output the video, here is the workflow I used (from civiai) DaSiWa WAN 2.2 i2v FastFidelity C-AiO-33.json

The result:

162534_00001.webm
My comfyui info: ComfyUI v0.7.2

sure, thanks:

"""
** ComfyUI startup time: 2026-01-20 02:04:30.819
[2026-01-20 02:04:30.819] ** Platform: Darwin
[2026-01-20 02:04:30.820] ** Python version: 3.12.11 (main, Aug 18 2025, 19:02:39) [Clang 20.1.4 ]
[2026-01-20 02:04:30.820] ** Python executable: /Users/owen/Projects/ComfyUI/.venv/bin/python
[2026-01-20 02:04:30.820] ** ComfyUI Path: /Applications/ComfyUI.app/Contents/Resources/ComfyUI
[2026-01-20 02:04:30.820] ** ComfyUI Base Folder Path: /Applications/ComfyUI.app/Contents/Resources/ComfyUI
[2026-01-20 02:04:30.820] ** User directory: /Users/owen/Projects/ComfyUI/user
[2026-01-20 02:04:30.820] ** ComfyUI-Manager config path: /Users/owen/Projects/ComfyUI/user/__manager/config.ini
[2026-01-20 02:04:30.820] ** Log path: /Users/owen/Projects/ComfyUI/user/comfyui.log
[ComfyUI-Manager] Skipped fixing the 'comfyui-frontend-package' dependency because the ComfyUI is outdated.
[2026-01-20 02:04:30.863] [PRE] ComfyUI-Manager
[2026-01-20 02:04:31.486] Checkpoint files will always be loaded safely.
[2026-01-20 02:04:31.499] Total VRAM 32768 MB, total RAM 32768 MB
[2026-01-20 02:04:31.500] pytorch version: 2.11.0.dev20260119
[2026-01-20 02:04:31.501] Mac Version (26, 2)
[2026-01-20 02:04:31.502] Set vram state to: SHARED
[2026-01-20 02:04:31.502] Device: mps
[2026-01-20 02:04:32.009] Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
[2026-01-20 02:04:32.009] Found comfy_kitchen backend triton: {'available': False, 'disabled': True, 'unavailable_reason': "ImportError: No module named 'triton'", 'capabilities': []}
[2026-01-20 02:04:32.009] Found comfy_kitchen backend cuda: {'available': False, 'disabled': True, 'unavailable_reason': 'Extension file not found: /Users/owen/Projects/ComfyUI/.venv/lib/python3.12/site-packages/comfy_kitchen/backends/cuda/_C.abi3.so', 'capabilities': []}
[2026-01-20 02:04:32.194] Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
[2026-01-20 02:04:33.509] Python version: 3.12.11 (main, Aug 18 2025, 19:02:39) [Clang 20.1.4 ]
[2026-01-20 02:04:33.509] ComfyUI version: 0.8.2
[2026-01-20 02:04:33.511] [Prompt Server] web root: /Applications/ComfyUI.app/Contents/Resources/ComfyUI/web_custom_versions/desktop_app
[2026-01-20 02:04:33.511] [START] ComfyUI-Manager
[2026-01-20 02:04:33.673] [ComfyUI-Manager] network_mode: public
[2026-01-20 02:04:33.680] [ComfyUI-Manager] The matrix sharing feature has been disabled because the matrix-nio dependency is not installed.
To use this feature, please run the following command:
/Users/owen/Projects/ComfyUI/.venv/bin/python -m pip install matrix-nio

[2026-01-20 02:04:33.866] Total VRAM 32768 MB, total RAM 32768 MB
[2026-01-20 02:04:33.866] pytorch version: 2.11.0.dev20260119
[2026-01-20 02:04:33.866] Mac Version (26, 2)
[2026-01-20 02:04:33.866] Set vram state to: SHARED
[2026-01-20 02:04:33.866] Device: mps
[2026-01-20 02:04:34.130] ComfyUI-GGUF: Allowing full torch compile
[2026-01-20 02:04:34.133]
Import times for custom nodes:
[2026-01-20 02:04:34.133] 0.0 seconds: /Applications/ComfyUI.app/Contents/Resources/ComfyUI/custom_nodes/websocket_image_save.py
[2026-01-20 02:04:34.133] 0.0 seconds: /Users/owen/Projects/ComfyUI/custom_nodes/ComfyUI-GGUF
[2026-01-20 02:04:34.133]
[2026-01-20 02:04:34.230] setup plugin alembic.autogenerate.schemas
[2026-01-20 02:04:34.230] setup plugin alembic.autogenerate.tables
[2026-01-20 02:04:34.231] setup plugin alembic.autogenerate.types
[2026-01-20 02:04:34.231] setup plugin alembic.autogenerate.constraints
[2026-01-20 02:04:34.231] setup plugin alembic.autogenerate.defaults
[2026-01-20 02:04:34.231] setup plugin alembic.autogenerate.comments
[2026-01-20 02:04:34.271] Failed to initialize database. Please ensure you have installed the latest requirements. If the error persists, please report this as in future the database will be required: (sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)
[2026-01-20 02:04:34.288] Starting server

[2026-01-20 02:04:34.289] To see the GUI go to: http://127.0.0.1:8000
[2026-01-20 02:04:34.712] comfyui-frontend-package not found in requirements.txt
[2026-01-20 02:04:39.296] got prompt
[2026-01-20 02:05:29.553] clip missing: ['gemma3_12b.logit_scale', 'gemma3_12b.transformer.model.embed_tokens.weight', 'gemma3_12b.transformer.model.layers.0.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.0.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.0.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.0.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.0.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.0.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.0.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.0.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.0.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.0.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.0.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.0.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.0.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.1.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.1.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.1.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.1.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.1.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.1.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.1.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.1.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.1.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.1.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.1.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.1.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.1.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.2.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.2.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.2.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.2.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.2.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.2.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.2.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.2.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.2.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.2.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.2.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.2.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.2.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.3.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.3.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.3.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.3.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.3.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.3.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.3.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.3.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.3.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.3.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.3.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.3.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.3.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.4.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.4.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.4.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.4.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.4.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.4.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.4.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.4.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.4.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.4.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.4.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.4.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.4.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.5.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.5.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.5.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.5.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.5.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.5.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.5.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.5.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.5.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.5.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.5.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.5.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.5.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.6.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.6.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.6.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.6.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.6.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.6.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.6.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.6.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.6.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.6.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.6.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.6.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.6.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.7.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.7.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.7.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.7.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.7.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.7.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.7.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.7.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.7.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.7.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.7.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.7.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.7.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.8.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.8.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.8.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.8.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.8.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.8.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.8.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.8.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.8.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.8.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.8.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.8.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.8.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.9.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.9.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.9.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.9.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.9.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.9.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.9.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.9.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.9.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.9.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.9.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.9.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.9.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.10.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.10.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.10.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.10.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.10.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.10.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.10.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.10.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.10.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.10.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.10.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.10.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.10.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.11.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.11.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.11.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.11.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.11.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.11.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.11.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.11.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.11.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.11.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.11.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.11.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.11.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.12.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.12.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.12.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.12.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.12.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.12.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.12.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.12.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.12.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.12.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.12.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.12.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.12.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.13.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.13.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.13.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.13.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.13.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.13.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.13.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.13.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.13.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.13.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.13.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.13.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.13.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.14.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.14.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.14.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.14.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.14.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.14.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.14.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.14.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.14.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.14.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.14.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.14.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.14.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.15.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.15.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.15.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.15.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.15.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.15.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.15.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.15.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.15.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.15.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.15.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.15.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.15.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.16.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.16.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.16.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.16.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.16.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.16.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.16.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.16.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.16.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.16.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.16.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.16.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.16.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.17.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.17.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.17.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.17.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.17.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.17.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.17.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.17.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.17.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.17.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.17.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.17.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.17.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.18.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.18.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.18.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.18.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.18.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.18.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.18.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.18.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.18.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.18.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.18.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.18.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.18.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.19.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.19.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.19.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.19.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.19.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.19.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.19.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.19.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.19.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.19.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.19.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.19.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.19.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.20.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.20.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.20.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.20.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.20.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.20.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.20.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.20.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.20.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.20.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.20.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.20.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.20.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.21.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.21.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.21.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.21.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.21.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.21.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.21.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.21.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.21.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.21.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.21.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.21.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.21.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.22.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.22.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.22.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.22.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.22.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.22.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.22.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.22.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.22.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.22.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.22.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.22.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.22.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.23.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.23.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.23.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.23.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.23.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.23.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.23.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.23.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.23.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.23.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.23.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.23.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.23.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.24.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.24.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.24.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.24.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.24.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.24.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.24.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.24.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.24.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.24.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.24.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.24.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.24.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.25.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.25.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.25.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.25.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.25.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.25.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.25.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.25.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.25.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.25.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.25.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.25.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.25.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.26.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.26.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.26.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.26.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.26.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.26.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.26.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.26.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.26.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.26.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.26.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.26.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.26.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.27.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.27.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.27.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.27.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.27.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.27.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.27.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.27.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.27.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.27.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.27.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.27.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.27.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.28.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.28.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.28.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.28.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.28.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.28.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.28.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.28.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.28.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.28.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.28.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.28.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.28.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.29.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.29.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.29.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.29.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.29.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.29.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.29.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.29.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.29.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.29.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.29.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.29.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.29.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.30.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.30.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.30.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.30.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.30.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.30.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.30.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.30.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.30.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.30.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.30.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.30.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.30.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.31.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.31.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.31.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.31.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.31.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.31.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.31.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.31.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.31.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.31.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.31.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.31.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.31.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.32.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.32.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.32.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.32.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.32.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.32.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.32.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.32.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.32.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.32.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.32.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.32.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.32.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.33.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.33.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.33.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.33.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.33.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.33.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.33.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.33.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.33.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.33.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.33.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.33.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.33.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.34.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.34.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.34.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.34.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.34.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.34.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.34.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.34.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.34.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.34.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.34.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.34.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.34.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.35.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.35.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.35.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.35.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.35.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.35.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.35.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.35.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.35.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.35.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.35.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.35.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.35.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.36.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.36.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.36.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.36.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.36.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.36.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.36.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.36.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.36.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.36.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.36.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.36.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.36.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.37.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.37.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.37.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.37.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.37.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.37.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.37.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.37.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.37.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.37.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.37.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.37.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.37.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.38.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.38.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.38.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.38.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.38.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.38.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.38.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.38.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.38.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.38.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.38.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.38.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.38.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.39.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.39.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.39.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.39.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.39.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.39.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.39.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.39.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.39.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.39.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.39.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.39.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.39.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.40.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.40.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.40.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.40.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.40.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.40.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.40.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.40.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.40.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.40.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.40.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.40.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.40.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.41.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.41.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.41.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.41.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.41.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.41.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.41.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.41.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.41.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.41.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.41.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.41.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.41.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.42.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.42.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.42.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.42.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.42.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.42.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.42.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.42.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.42.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.42.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.42.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.42.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.42.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.43.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.43.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.43.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.43.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.43.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.43.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.43.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.43.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.43.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.43.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.43.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.43.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.43.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.44.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.44.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.44.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.44.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.44.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.44.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.44.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.44.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.44.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.44.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.44.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.44.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.44.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.45.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.45.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.45.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.45.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.45.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.45.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.45.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.45.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.45.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.45.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.45.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.45.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.45.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.46.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.46.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.46.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.46.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.46.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.46.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.46.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.46.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.46.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.46.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.46.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.46.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.46.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.47.self_attn.q_proj.weight', 'gemma3_12b.transformer.model.layers.47.self_attn.k_proj.weight', 'gemma3_12b.transformer.model.layers.47.self_attn.v_proj.weight', 'gemma3_12b.transformer.model.layers.47.self_attn.o_proj.weight', 'gemma3_12b.transformer.model.layers.47.self_attn.q_norm.weight', 'gemma3_12b.transformer.model.layers.47.self_attn.k_norm.weight', 'gemma3_12b.transformer.model.layers.47.mlp.gate_proj.weight', 'gemma3_12b.transformer.model.layers.47.mlp.up_proj.weight', 'gemma3_12b.transformer.model.layers.47.mlp.down_proj.weight', 'gemma3_12b.transformer.model.layers.47.input_layernorm.weight', 'gemma3_12b.transformer.model.layers.47.post_attention_layernorm.weight', 'gemma3_12b.transformer.model.layers.47.pre_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.layers.47.post_feedforward_layernorm.weight', 'gemma3_12b.transformer.model.norm.weight', 'gemma3_12b.transformer.multi_modal_projector.mm_input_projection_weight', 'gemma3_12b.transformer.multi_modal_projector.mm_soft_emb_norm.weight', 'gemma3_12b.transformer.vision_model.embeddings.patch_embedding.weight', 'gemma3_12b.transformer.vision_model.embeddings.patch_embedding.bias', 'gemma3_12b.transformer.vision_model.embeddings.position_embedding.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.0.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.0.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.1.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.1.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.2.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.2.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.3.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.3.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.4.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.4.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.5.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.5.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.6.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.6.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.7.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.7.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.8.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.8.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.9.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.9.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.10.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.10.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.11.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.11.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.12.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.12.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.13.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.13.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.14.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.14.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.15.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.15.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.16.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.16.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.17.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.17.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.18.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.18.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.19.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.19.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.20.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.20.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.21.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.21.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.22.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.22.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.23.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.23.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.24.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.24.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.25.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.25.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.layer_norm1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.layer_norm1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.q_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.q_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.k_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.k_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.v_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.v_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.out_proj.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.self_attn.out_proj.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.layer_norm2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.layer_norm2.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.mlp.fc1.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.mlp.fc1.bias', 'gemma3_12b.transformer.vision_model.encoder.layers.26.mlp.fc2.weight', 'gemma3_12b.transformer.vision_model.encoder.layers.26.mlp.fc2.bias', 'gemma3_12b.transformer.vision_model.post_layernorm.weight', 'gemma3_12b.transformer.vision_model.post_layernorm.bias']
[2026-01-20 02:05:29.568] Requested to load LTXAVTEModel_
[2026-01-20 02:05:29.716] loaded completely; 25965.49 MB loaded, full load: True
[2026-01-20 02:05:29.755] CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16
[2026-01-20 02:05:54.305] FETCH ComfyRegistry Data [DONE]
[2026-01-20 02:05:54.493] [ComfyUI-Manager] default cache updated: https://api.comfy.org/nodes
[2026-01-20 02:05:54.539] FETCH DATA from: /Users/owen/Projects/ComfyUI/user/__manager/cache/1514988643_custom-node-list.json [DONE]
[2026-01-20 02:05:54.561] [ComfyUI-Manager] All startup tasks have been completed.
[2026-01-20 02:07:33.791] Found quantization metadata version 1
[2026-01-20 02:07:33.792] Detected mixed precision quantization
[2026-01-20 02:07:33.801] Using mixed precision operations
[2026-01-20 02:07:33.919] model weight dtype torch.bfloat16, manual cast: torch.bfloat16
[2026-01-20 02:07:33.923] model_type FLUX
[2026-01-20 02:08:28.603] unet unexpected: ['audio_embeddings_connector.learnable_registers', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.k_norm.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.q_norm.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_k.bias', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_k.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_out.0.bias', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_out.0.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_q.bias', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_q.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_v.bias', 'audio_embeddings_connector.transformer_1d_blocks.0.attn1.to_v.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.ff.net.0.proj.bias', 'audio_embeddings_connector.transformer_1d_blocks.0.ff.net.0.proj.weight', 'audio_embeddings_connector.transformer_1d_blocks.0.ff.net.2.bias', 'audio_embeddings_connector.transformer_1d_blocks.0.ff.net.2.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.k_norm.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.q_norm.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_k.bias', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_k.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_out.0.bias', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_out.0.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_q.bias', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_q.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_v.bias', 'audio_embeddings_connector.transformer_1d_blocks.1.attn1.to_v.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.ff.net.0.proj.bias', 'audio_embeddings_connector.transformer_1d_blocks.1.ff.net.0.proj.weight', 'audio_embeddings_connector.transformer_1d_blocks.1.ff.net.2.bias', 'audio_embeddings_connector.transformer_1d_blocks.1.ff.net.2.weight', 'video_embeddings_connector.learnable_registers', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.k_norm.weight', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.q_norm.weight', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_k.bias', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_k.weight', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_out.0.bias', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_out.0.weight', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_q.bias', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_q.weight', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_v.bias', 'video_embeddings_connector.transformer_1d_blocks.0.attn1.to_v.weight', 'video_embeddings_connector.transformer_1d_blocks.0.ff.net.0.proj.bias', 'video_embeddings_connector.transformer_1d_blocks.0.ff.net.0.proj.weight', 'video_embeddings_connector.transformer_1d_blocks.0.ff.net.2.bias', 'video_embeddings_connector.transformer_1d_blocks.0.ff.net.2.weight', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.k_norm.weight', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.q_norm.weight', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_k.bias', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_k.weight', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_out.0.bias', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_out.0.weight', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_q.bias', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_q.weight', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_v.bias', 'video_embeddings_connector.transformer_1d_blocks.1.attn1.to_v.weight', 'video_embeddings_connector.transformer_1d_blocks.1.ff.net.0.proj.bias', 'video_embeddings_connector.transformer_1d_blocks.1.ff.net.0.proj.weight', 'video_embeddings_connector.transformer_1d_blocks.1.ff.net.2.bias', 'video_embeddings_connector.transformer_1d_blocks.1.ff.net.2.weight']
[2026-01-20 02:08:45.635] VAE load device: mps, offload device: cpu, dtype: torch.bfloat16
[2026-01-20 02:08:45.641] no CLIP/text encoder weights in checkpoint, the text encoder model will not be loaded.
[2026-01-20 02:08:45.646] loaded diffusion model directly to GPU
[2026-01-20 02:08:45.646] Requested to load LTXAV
[2026-01-20 02:08:45.905] loaded completely; 20541.27 MB loaded, full load: True
[2026-01-20 03:50:42.265]
100%|██████████| 20/20 [1:41:55<00:00, 307.01s/it]
100%|██████████| 20/20 [1:41:55<00:00, 305.77s/it]
[2026-01-20 03:51:01.266] lora key not loaded: text_embedding_projection.aggregate_embed.lora_A.weight
[2026-01-20 03:51:01.267] lora key not loaded: text_embedding_projection.aggregate_embed.lora_B.weight
[2026-01-20 03:51:01.326] Requested to load LTXAV
[2026-01-20 03:51:01.351] 0 models unloaded.
[2026-01-20 03:53:16.601] !!! Exception during processing !!! Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.
[2026-01-20 03:53:16.646] Traceback (most recent call last):
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 518, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 329, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 303, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 291, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy_api/internal/init.py", line 149, in wrapped_func
return method(locked_class, **inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy_api/latest/_io.py", line 1570, in EXECUTE_NORMALIZED
to_return = cls.execute(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 950, in execute
samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/samplers.py", line 1050, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/samplers.py", line 984, in outer_sample
self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/sampler_helpers.py", line 130, in prepare_sampling
return executor.execute(model, noise_shape, conds, model_options=model_options, force_full_load=force_full_load)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/sampler_helpers.py", line 138, in prepare_sampling
comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory, force_full_load=force_full_load)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_management.py", line 704, in load_models_gpu
loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_management.py", line 509, in model_load
self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_management.py", line 539, in model_use_more_vram
return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 983, in partially_load
raise e
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 980, in partially_load
self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 778, in load
self.patch_weight_to_device(key, device_to=device_to)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 640, in patch_weight_to_device
set_func(out_weight, inplace_update=inplace_update, seed=string_to_seed(key))
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/ops.py", line 693, in set_weight
weight = QuantizedTensor.from_float(weight, self.layout_type, scale="recalculate", stochastic_rounding=seed, inplace_ops=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owen/Projects/ComfyUI/.venv/lib/python3.12/site-packages/comfy_kitchen/tensor/base.py", line 234, in from_float
qdata, params = get_layout_class(layout_cls).quantize(tensor, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/quant_ops.py", line 79, in quantize
qdata = comfy.float.stochastic_rounding(tensor, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/float.py", line 64, in stochastic_rounding
output[i:i+slice_size].copy
(manual_stochastic_round_to_float8(value[i:i+slice_size], dtype, generator=generator))
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtype.

[2026-01-20 03:53:16.673] Prompt executed in 01:48:37
"""

@AoiYamada
Copy link
Copy Markdown
Author

AoiYamada commented Jan 20, 2026

@Owen1226, The problem you encountered looks like another error comes from LTX workflow.

Error Location:

   File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/float.py", line 64, in stochastic_rounding
   output[i:i+slice_size].copy(manual_stochastic_round_to_float8(value[i:i+slice_size], dtype, generator=generator))

Root Cause:

The model LTXAV is using FP8 quantization. When ComfyUI tries to load it onto MPS, it's attempting to do stochastic rounding to FP8 directly on MPS, which doesn't support FP8 operations.

This new error is in stochastic_rounding() during model loading.

You need to patch comfy/float.py's stochastic_rounding() function:

def stochastic_rounding(value, dtype, seed=None):
    """Round with stochastic rounding."""
    if dtype == torch.float8_e4m3fn or dtype == torch.float8_e5m2:
        if value.device.type == 'mps':
            # MPS doesn't support FP8 - use CPU for the rounding
            value_cpu = value.cpu()
            rounded_cpu = manual_stochastic_round_to_float8(value_cpu, dtype, generator=generator)
            return rounded_cpu.to(value.device)
    
    # Original logic...

Or better patch the quantize function:

In comfy/quant_ops.py line 79:

def quantize(self, tensor, **kwargs):
    # ... existing code ...
    if tensor.device.type == 'mps':
        # Move to CPU for FP8 quantization on MPS
        tensor_cpu = tensor.cpu()
        qdata_cpu = comfy.float.stochastic_rounding(tensor_cpu, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)
        qdata = qdata_cpu.to(tensor.device)
    else:
        qdata = comfy.float.stochastic_rounding(tensor, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)

Since I don't have access to the LTXAV model or your specific workflow, would you be willing to test a similar fix for the quantization path?

If you confirm this fixes your issue, it would be great if you could submit a separate PR (seems the error is not located in this repo)

@AoiYamada
Copy link
Copy Markdown
Author

@comfyanonymous
Hi, could you please check this?
I need to patch this manually every time I update my ComfyUI 😭

This is the model I used:
https://civitai.com/models/1981116?modelVersionId=2512098

and the workflow:

DaSiWa WAN 2.2 i2v FastFidelity C-AiO-33.json

@Owen1226
Copy link
Copy Markdown

Owen1226 commented Jan 23, 2026

@Owen1226, The problem you encountered looks like another error comes from LTX workflow.

Error Location:

   File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/float.py", line 64, in stochastic_rounding
   output[i:i+slice_size].copy(manual_stochastic_round_to_float8(value[i:i+slice_size], dtype, generator=generator))

Root Cause:

The model LTXAV is using FP8 quantization. When ComfyUI tries to load it onto MPS, it's attempting to do stochastic rounding to FP8 directly on MPS, which doesn't support FP8 operations.

This new error is in stochastic_rounding() during model loading.

You need to patch comfy/float.py's stochastic_rounding() function:

def stochastic_rounding(value, dtype, seed=None):
    """Round with stochastic rounding."""
    if dtype == torch.float8_e4m3fn or dtype == torch.float8_e5m2:
        if value.device.type == 'mps':
            # MPS doesn't support FP8 - use CPU for the rounding
            value_cpu = value.cpu()
            rounded_cpu = manual_stochastic_round_to_float8(value_cpu, dtype, generator=generator)
            return rounded_cpu.to(value.device)
    
    # Original logic...

Or better patch the quantize function:

In comfy/quant_ops.py line 79:

def quantize(self, tensor, **kwargs):
    # ... existing code ...
    if tensor.device.type == 'mps':
        # Move to CPU for FP8 quantization on MPS
        tensor_cpu = tensor.cpu()
        qdata_cpu = comfy.float.stochastic_rounding(tensor_cpu, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)
        qdata = qdata_cpu.to(tensor.device)
    else:
        qdata = comfy.float.stochastic_rounding(tensor, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)

Since I don't have access to the LTXAV model or your specific workflow, would you be willing to test a similar fix for the quantization path?

If you confirm this fixes your issue, it would be great if you could submit a separate PR (seems the error is not located in this repo)

Hello. I found my comfy/quant_ops.py below is so different from yours. Let me try your update in comfy/float.py.

import torch
import logging

try:
    import comfy_kitchen as ck
    from comfy_kitchen.tensor import (
        QuantizedTensor,
        QuantizedLayout,
        TensorCoreFP8Layout as _CKFp8Layout,
        TensorCoreNVFP4Layout as _CKNvfp4Layout,
        register_layout_op,
        register_layout_class,
        get_layout_class,
    )
    _CK_AVAILABLE = True
    if torch.version.cuda is None:
        ck.registry.disable("cuda")
    else:
        cuda_version = tuple(map(int, str(torch.version.cuda).split('.')))
        if cuda_version < (13,):
            ck.registry.disable("cuda")
            logging.warning("WARNING: You need pytorch with cu130 or higher to use optimized CUDA operations.")

    ck.registry.disable("triton")
    for k, v in ck.list_backends().items():
        logging.info(f"Found comfy_kitchen backend {k}: {v}")
except ImportError as e:
    logging.error(f"Failed to import comfy_kitchen, Error: {e}, fp8 and fp4 support will not be available.")
    _CK_AVAILABLE = False

    class QuantizedTensor:
        pass

    class _CKFp8Layout:
        pass

    class _CKNvfp4Layout:
        pass

    def register_layout_class(name, cls):
        pass

    def get_layout_class(name):
        return None

import comfy.float

# ==============================================================================
# FP8 Layouts with Comfy-Specific Extensions
# ==============================================================================

class _TensorCoreFP8LayoutBase(_CKFp8Layout):
    FP8_DTYPE = None  # Must be overridden in subclass

    @classmethod
    def quantize(cls, tensor, scale=None, stochastic_rounding=0, inplace_ops=False):
        if cls.FP8_DTYPE is None:
            raise NotImplementedError(f"{cls.__name__} must define FP8_DTYPE")

        orig_dtype = tensor.dtype
        orig_shape = tuple(tensor.shape)

        if isinstance(scale, str) and scale == "recalculate":
            scale = torch.amax(tensor.abs()).to(dtype=torch.float32) / torch.finfo(cls.FP8_DTYPE).max
            if tensor.dtype not in [torch.float32, torch.bfloat16]:  # Prevent scale from being too small
                tensor_info = torch.finfo(tensor.dtype)
                scale = (1.0 / torch.clamp((1.0 / scale), min=tensor_info.min, max=tensor_info.max))

        if scale is None:
            scale = torch.ones((), device=tensor.device, dtype=torch.float32)
        if not isinstance(scale, torch.Tensor):
            scale = torch.tensor(scale, device=tensor.device, dtype=torch.float32)

        if stochastic_rounding > 0:
            if inplace_ops:
                tensor *= (1.0 / scale).to(tensor.dtype)
            else:
                tensor = tensor * (1.0 / scale).to(tensor.dtype)
            qdata = comfy.float.stochastic_rounding(tensor, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)
        else:
            qdata = ck.quantize_per_tensor_fp8(tensor, scale, cls.FP8_DTYPE)

        params = cls.Params(scale=scale.float(), orig_dtype=orig_dtype, orig_shape=orig_shape)
        return qdata, params


class TensorCoreNVFP4Layout(_CKNvfp4Layout):
    @classmethod
    def quantize(cls, tensor, scale=None, stochastic_rounding=0, inplace_ops=False):
        if tensor.dim() != 2:
            raise ValueError(f"NVFP4 requires 2D tensor, got {tensor.dim()}D")

        orig_dtype = tensor.dtype
        orig_shape = tuple(tensor.shape)

        if scale is None or (isinstance(scale, str) and scale == "recalculate"):
            scale = torch.amax(tensor.abs()) / (ck.float_utils.F8_E4M3_MAX * ck.float_utils.F4_E2M1_MAX)

        if not isinstance(scale, torch.Tensor):
            scale = torch.tensor(scale)
        scale = scale.to(device=tensor.device, dtype=torch.float32)

        padded_shape = cls.get_padded_shape(orig_shape)
        needs_padding = padded_shape != orig_shape

        if stochastic_rounding > 0:
            qdata, block_scale = comfy.float.stochastic_round_quantize_nvfp4(tensor, scale, pad_16x=needs_padding, seed=stochastic_rounding)
        else:
            qdata, block_scale = ck.quantize_nvfp4(tensor, scale, pad_16x=needs_padding)

        params = cls.Params(
            scale=scale,
            orig_dtype=orig_dtype,
            orig_shape=orig_shape,
            block_scale=block_scale,
        )
        return qdata, params


class TensorCoreFP8E4M3Layout(_TensorCoreFP8LayoutBase):
    FP8_DTYPE = torch.float8_e4m3fn


class TensorCoreFP8E5M2Layout(_TensorCoreFP8LayoutBase):
    FP8_DTYPE = torch.float8_e5m2


# Backward compatibility alias - default to E4M3
TensorCoreFP8Layout = TensorCoreFP8E4M3Layout


# ==============================================================================
# Registry
# ==============================================================================

register_layout_class("TensorCoreFP8Layout", TensorCoreFP8Layout)
register_layout_class("TensorCoreFP8E4M3Layout", TensorCoreFP8E4M3Layout)
register_layout_class("TensorCoreFP8E5M2Layout", TensorCoreFP8E5M2Layout)
register_layout_class("TensorCoreNVFP4Layout", TensorCoreNVFP4Layout)

QUANT_ALGOS = {
    "float8_e4m3fn": {
        "storage_t": torch.float8_e4m3fn,
        "parameters": {"weight_scale", "input_scale"},
        "comfy_tensor_layout": "TensorCoreFP8E4M3Layout",
    },
    "float8_e5m2": {
        "storage_t": torch.float8_e5m2,
        "parameters": {"weight_scale", "input_scale"},
        "comfy_tensor_layout": "TensorCoreFP8E5M2Layout",
    },
    "nvfp4": {
        "storage_t": torch.uint8,
        "parameters": {"weight_scale", "weight_scale_2", "input_scale"},
        "comfy_tensor_layout": "TensorCoreNVFP4Layout",
        "group_size": 16,
    },
}


# ==============================================================================
# Re-exports for backward compatibility
# ==============================================================================

__all__ = [
    "QuantizedTensor",
    "QuantizedLayout",
    "TensorCoreFP8Layout",
    "TensorCoreFP8E4M3Layout",
    "TensorCoreFP8E5M2Layout",
    "TensorCoreNVFP4Layout",
    "QUANT_ALGOS",
    "register_layout_op",
]

@Owen1226
Copy link
Copy Markdown

I updated float.py as your suggestion. But still got an error below.

image

"""
2026-01-23T02:58:18.580167 - Traceback (most recent call last):
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 518, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 329, in get_output_data
return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 303, in _async_map_node_over_list
await process_inputs(input_dict, i)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/execution.py", line 291, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy_api/internal/init.py", line 149, in wrapped_func
return method(locked_class, **inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy_api/latest/_io.py", line 1582, in EXECUTE_NORMALIZED
to_return = cls.execute(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 950, in execute
samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/samplers.py", line 1050, in sample
output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/samplers.py", line 984, in outer_sample
self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/sampler_helpers.py", line 130, in prepare_sampling
return executor.execute(model, noise_shape, conds, model_options=model_options, force_full_load=force_full_load)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/sampler_helpers.py", line 138, in _prepare_sampling
comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory, force_full_load=force_full_load)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_management.py", line 715, in load_models_gpu
loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_management.py", line 520, in model_load
self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_management.py", line 550, in model_use_more_vram
return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 983, in partially_load
raise e
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 980, in partially_load
self.load(device_to, lowvram_model_memory=current_used + extra_memory, force_patch_weights=force_patch_weights, full_load=full_load)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 778, in load
self.patch_weight_to_device(key, device_to=device_to)
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/model_patcher.py", line 640, in patch_weight_to_device
set_func(out_weight, inplace_update=inplace_update, seed=string_to_seed(key))
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/ops.py", line 702, in set_weight
weight = QuantizedTensor.from_float(weight, self.layout_type, scale="recalculate", stochastic_rounding=seed, inplace_ops=True).to(self.weight.dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/owen/Projects/ComfyUI/.venv/lib/python3.12/site-packages/comfy_kitchen/tensor/base.py", line 234, in from_float
qdata, params = get_layout_class(layout_cls).quantize(tensor, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/quant_ops.py", line 79, in quantize
qdata = comfy.float.stochastic_rounding(tensor, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/float.py", line 64, in stochastic_rounding
rounded_cpu = manual_stochastic_round_to_float8(value_cpu, dtype, generator=generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/float.py", line 36, in manual_stochastic_round_to_float8
abs_x[:] = calc_mantissa(abs_x, exponent, normal_mask, MANTISSA_BITS, EXPONENT_BIAS, generator=generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Applications/ComfyUI.app/Contents/Resources/ComfyUI/comfy/float.py", line 10, in calc_mantissa
mantissa_scaled += torch.rand(mantissa_scaled.size(), dtype=mantissa_scaled.dtype, layout=mantissa_scaled.layout, device=mantissa_scaled.device, generator=generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Placeholder storage has not been allocated on MPS device!
"""

@bedovyy
Copy link
Copy Markdown

bedovyy commented Jan 25, 2026

regarding 'Model conversion to FP16: Even after converting all WAN model to FP16 and using FP16 vae, text encoder, the sampler still produces FP8 tensors internally',
it's may because your safetensor file still have comfy_quant layer, which has dtype information of the weight. (so your safetensor think it's still FP8.)
try removing them during converting like

  if "comfy_quant" in key:
    continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants