Question about the size of the quantized model

Thanks for the excellent work!

I use examples/basic_quant_mix.py to quantize the Qwen2-7B model with --w_bit 8.  It's very strange that the quantized model is even larger than the original model. 

As far as I know, the purpose of quantization is to reduce the size of the model and thus the consumption of GPU memory, but why is the model larger after MixQ quantization?

Thanks!