Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.
翻译:边缘设备上目标检测网络的高效推理是一项重大挑战。后训练量化(PTQ)作为一种直接将以全精度模型转换为低位宽模型的有效且便捷方法,可降低模型推理复杂度。但该方法应用于目标检测等复杂任务时会出现严重的精度下降。PTQ通过不同度量优化量化参数,以最小化量化扰动。特征图量化前后的p范数距离Lp被广泛用作评估扰动的度量。针对目标检测网络的特殊性,我们观察到Lp度量中的参数p会显著影响其量化性能。研究指出,使用固定的超参数p无法获得最优量化性能。为解决此问题,我们提出DetPTQ框架,该框架利用目标检测输出损失(ODOL)为不同层分配不同的p值进行量化,其中ODOL代表目标检测的任务损失。DetPTQ采用基于ODOL的自适应Lp度量来选取最优量化参数。实验表明,我们的DetPTQ在2D和3D目标检测器上均显著优于现有最先进的PTQ方法。例如,在RetinaNet-ResNet18模型上采用4比特权重和4比特激活量化时,我们实现了31.1/31.7(量化/全精度)的mAP。