After the model is quantized, an error occurs during tensor parallelization

Hi, we are very interested in MixQ after seeing its performance improvements. We have quantized a 8-bit Qwen2.5-32B model using MixQ and hope to run it on two GPUs using tensor parallelism(--tensor-parallel-size 2). However, we encounter the following error during vLLM execution. When we set --tensor-parallel-size to 1, it runs normally. Could you please advise me on how to adjust this to resolve the issue? Thank you.

![image2024-12-3_11-36-11](https://github.com/user-attachments/assets/53825049-fc66-49a7-8aba-a9eec39da796)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After the model is quantized, an error occurs during tensor parallelization #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

After the model is quantized, an error occurs during tensor parallelization #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions