Skip to content

After the model is quantized, an error occurs during tensor parallelization #5

@VinayaLee

Description

@VinayaLee

Hi, we are very interested in MixQ after seeing its performance improvements. We have quantized a 8-bit Qwen2.5-32B model using MixQ and hope to run it on two GPUs using tensor parallelism(--tensor-parallel-size 2). However, we encounter the following error during vLLM execution. When we set --tensor-parallel-size to 1, it runs normally. Could you please advise me on how to adjust this to resolve the issue? Thank you.

image2024-12-3_11-36-11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions