Fix FP8 tensor support on MPS backend for Apple Silicon Macs#23
Conversation
|
After modifying this code in my comfy ui, I could generate a 544*720, 480p, 5s video within 25 mins by wan fp8(converted to fp16) model, which took an hour and was still not finished before. Hope this helps other people who're struggling with this problem. |
Hello, I tried your updates. It worked well at first, but it got the same error after the process bar reached 100%. |
Could you paste the error logs here? maybe there is another problem I tested in my macbook m4 pro max and successfully output the video, here is the workflow I used (from civiai) The result: 162534_00001.webmMy comfyui info: |
sure, thanks: """ [2026-01-20 02:04:33.866] Total VRAM 32768 MB, total RAM 32768 MB [2026-01-20 02:04:34.289] To see the GUI go to: http://127.0.0.1:8000 [2026-01-20 03:53:16.673] Prompt executed in 01:48:37 |
|
@Owen1226, The problem you encountered looks like another error comes from LTX workflow. Error Location: Root Cause: The model LTXAV is using FP8 quantization. When ComfyUI tries to load it onto MPS, it's attempting to do stochastic rounding to FP8 directly on MPS, which doesn't support FP8 operations. This new error is in stochastic_rounding() during model loading. You need to patch comfy/float.py's stochastic_rounding() function: def stochastic_rounding(value, dtype, seed=None):
"""Round with stochastic rounding."""
if dtype == torch.float8_e4m3fn or dtype == torch.float8_e5m2:
if value.device.type == 'mps':
# MPS doesn't support FP8 - use CPU for the rounding
value_cpu = value.cpu()
rounded_cpu = manual_stochastic_round_to_float8(value_cpu, dtype, generator=generator)
return rounded_cpu.to(value.device)
# Original logic...Or better patch the quantize function: In comfy/quant_ops.py line 79: def quantize(self, tensor, **kwargs):
# ... existing code ...
if tensor.device.type == 'mps':
# Move to CPU for FP8 quantization on MPS
tensor_cpu = tensor.cpu()
qdata_cpu = comfy.float.stochastic_rounding(tensor_cpu, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)
qdata = qdata_cpu.to(tensor.device)
else:
qdata = comfy.float.stochastic_rounding(tensor, dtype=cls.FP8_DTYPE, seed=stochastic_rounding)Since I don't have access to the LTXAV model or your specific workflow, would you be willing to test a similar fix for the quantization path? If you confirm this fixes your issue, it would be great if you could submit a separate PR (seems the error is not located in this repo) |
|
@comfyanonymous This is the model I used: and the workflow: |
Hello. I found my |
|
regarding 'Model conversion to FP16: Even after converting all WAN model to FP16 and using FP16 vae, text encoder, the sampler still produces FP8 tensors internally', |

Problem:
Users on Apple Silicon Macs (MPS backend) encounter
TypeError: Trying to convert Float8_e4m3fn to the MPS backend but it does not have support for that dtypewhen running FP8-quantized models through the sampler. This occurs because MPS does not natively support FP8 data type conversions.Related Issues:
(all report similar FP8/MPS compatibility issues)
Previous Attempts & Limitations:
--cpuflag): Makes generation prohibitively slowThe conversion script:
Root Cause:
The
comfy_kitchenquantization library attempts direct FP8 tensor operations on MPS, which lacks FP8 support. The error occurs indequantize_per_tensor_fp8()when trying to convert FP8 tensors to other formats on MPS.Solution:
Add a device-aware fallback in
dequantize_per_tensor_fp8():This minimal fix enables FP8-quantized models to work on Apple Silicon while: