Question about per-token quant 

Can you introduce how to perform `per-token` quantization on `o_proj` and  `down_proj` exactly?

https://github.com/AniZpZ/AutoSmoothQuant/blob/main/autosmoothquant/layers/nn/linear.py#L310
```python
int8_weight, weight_scale = quantize_per_tensor_absmax(module.weight)
        if act_quant == "per-token":
            alpha = weight_scale
```
when  using `per-token`, the `weight_scale` is from `quantize_per_tensor_absmax`, this is a bit confusing for me.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about per-token quant #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about per-token quant #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions