|
rotate_ov_proj(layer, model_type, num_heads, head_dim) |
Thanks for your great work!
In the original quarot paper, rotate_ov_proj is designed to rotate the v state, so the (4bit) quantization for the value cache can get easier. But in this repo, I think we only want to rotate the weight and the linear output, we do not want to quant the v cache. So, is this necessary to have this rotate_ov_proj here? If my understanding is incorrect, please point it out. I am looking forward to your reply. Thanks!
QQQ/QQQ/rotation/rotation.py
Line 194 in e307d9f
Thanks for your great work!
In the original quarot paper, rotate_ov_proj is designed to rotate the v state, so the (4bit) quantization for the value cache can get easier. But in this repo, I think we only want to rotate the weight and the linear output, we do not want to quant the v cache. So, is this necessary to have this rotate_ov_proj here? If my understanding is incorrect, please point it out. I am looking forward to your reply. Thanks!