Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in real world deployment. To the best of our knowledge, this is the first time that vision transformers have been quantized in object detection and image segmentation tasks.
翻译:神经网络量化旨在通过使用低位近似来加速和精简全精度神经网络模型。近期采用量化感知训练(QAT)范式的方法快速增长,但通常概念复杂。本文提出一种新颖且高效的QAT方法——量化特征蒸馏(QFD)。QFD首先训练量化(或二值化)表示作为教师模型,然后利用知识蒸馏(KD)对网络进行量化。定量结果表明,QFD比以往的量化方法更灵活且更有效(即量化友好)。尽管实现更简单,QFD不仅在图像分类任务上,而且在目标检测任务上均以显著优势超越现有方法。此外,QFD在MS-COCO数据集上对ViT和Swin-Transformer进行量化检测与分割实验,验证了其在实际部署中的潜力。据我们所知,这是首次将视觉Transformer应用于目标检测和图像分割任务的量化研究。