Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Deployment/Quant_guide.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Deep learning models are typically trained with floating point data but they can quantized into integers during inference without any loss of performance (i.e. accuracy). Quantizing models includes quantizing both the weights and activation data (or layer input/outputs). In this work, we quantize the floating point weights/activation data to [Qm.n format](https://en.wikipedia.org/wiki/Q_(number_format)), where m,n are fixed within a layer but can vary across different network layers.
Deep learning models are typically trained with floating point data but they can be quantized into integers during inference without any loss of performance (i.e. accuracy). Quantizing models includes quantizing both the weights and activation data (or layer input/outputs). In this work, we quantize the floating point weights/activation data to [Qm.n format](https://en.wikipedia.org/wiki/Q_(number_format)), where m,n are fixed within a layer but can vary across different network layers.

## Quantize weights
Quantizing weights is fairly simple, as the weights are fixed after the training and we know their min/max range. Using these ranges, the weights are quantized or discretized to 256 levels. Here is the code snippet for quantizing the weights and biases to 8-bit integers.
Expand Down