Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization
This paper is the first FPGA-based ViT Framework. On software part, it employs a mixed-quantization (PoT(mainly on LUT) + Fixed-point(mainly on DSP)) to optimize FPGA resource utilization. On hardware part, the framework is constructed by HLS. To address challenges arising from layer-wise multi-precision, it aligns the output of different quantization schemes, and uses the same ratio of fixed-point to PoT for each head. Parameter tuning is performed by initially fixing Frames Per Second (FPS), then determining the precision and scheme combination, and finally adjusting other parameters based on resource utilization estimates.
This paper introduces a novel Vision-Transformer Accelerator designed to address two challenges present in ViT and its variants. One is the path dependency introduced by the shortcut machanism(residual add). The solution involves a half layer mapping technique, leveraging two kinds of hardware design patterns. By incorporating two reuse engines(MSA, MLP) and implementing the streaming pattern within engines. Residual add is relocated from the end of current engine to the start of next engine. The paper also analyzes the data locality of different input dimension of to facilitate parallel computation. It was tested using HLS to generate hardware code and run on a FPGA. To be noted, the data precision used is float16.
Pruned ViT