Quantized networks use less computational and memory resources and are suitable for deployment on edge devices. While quantization-aware training QAT is the well-studied approach to quantize the networks at low precision, most research focuses on over-parameterized networks for classification with limited studies on popular and edge device friendly single-shot object detection and semantic segmentation methods like YOLO. Moreover, majority of QAT methods rely on Straight-through Estimator (STE) approximation which suffers from an oscillation phenomenon resulting in sub-optimal network quantization. In this paper, we show that it is difficult to achieve extremely low precision (4-bit and lower) for efficient YOLO models even with SOTA QAT methods due to oscillation issue and existing methods to overcome this problem are not effective on these models. To mitigate the effect of oscillation, we first propose Exponentially Moving Average (EMA) based update to the QAT model. Further, we propose a simple QAT correction method, namely QC, that takes only a single epoch of training after standard QAT procedure to correct the error induced by oscillating weights and activations resulting in a more accurate quantized model. With extensive evaluation on COCO dataset using various YOLO5 and YOLO7 variants, we show that our correction method improves quantized YOLO networks consistently on both object detection and segmentation tasks at low-precision (4-bit and 3-bit).
翻译:量化网络具有较低的计算和内存资源需求,适用于边缘设备部署。尽管量化感知训练(QAT)是低精度网络量化的成熟方法,但现有研究多集中于分类任务的过参数化网络,对YOLO这类流行且适合边缘设备的单次目标检测与语义分割方法研究不足。此外,多数QAT方法依赖直通估计器(STE)近似,该近似存在振荡现象,导致网络量化效果次优。本文证明,即便采用最先进的QAT方法,由于振荡问题,高效YOLO模型难以实现极低精度(4位及以下)量化,且现有解决方案对此类模型效果有限。为缓解振荡影响,我们首先提出基于指数移动平均(EMA)的QAT模型更新机制。进一步,我们提出一种简单有效的QAT校正方法QC:在标准QAT训练后仅需单轮训练,即可修正振荡权重与激活值引起的误差,从而获得更精准的量化模型。在COCO数据集上对YOLOv5与YOLOv7系列变体的广泛评估表明,我们的校正方法能在低精度(4位和3位)条件下,持续提升YOLO量化网络在目标检测与分割任务中的性能。