Hi, we are very interested in MixQ after seeing its performance improvements. We have quantized a 8-bit Qwen2.5-32B model using MixQ and hope to run it on two GPUs using tensor parallelism(--tensor-parallel-size 2). However, we encounter the following error during vLLM execution. When we set --tensor-parallel-size to 1, it runs normally. Could you please advise me on how to adjust this to resolve the issue? Thank you.

Hi, we are very interested in MixQ after seeing its performance improvements. We have quantized a 8-bit Qwen2.5-32B model using MixQ and hope to run it on two GPUs using tensor parallelism(--tensor-parallel-size 2). However, we encounter the following error during vLLM execution. When we set --tensor-parallel-size to 1, it runs normally. Could you please advise me on how to adjust this to resolve the issue? Thank you.