Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources. Network quantization is a powerful technique to compress neural networks, allowing for more efficient and scalable AI deployments. Recently, Re-parameterization has emerged as a promising technique to enhance model performance while simultaneously alleviating the computational burden in various computer vision tasks. However, the accuracy drops significantly when applying quantization on the re-parameterized networks. We identify that the primary challenge arises from the large variation in weight distribution across the original branches. To address this issue, we propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight, and develop an improved KL metric to determine optimal quantization scales for activation. To the best of our knowledge, our approach is the first work that enables post-training quantization applicable on re-parameterized networks. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss. The code is in https://github.com/NeonHo/Coarse-Fine-Weight-Split.git
翻译:尽管神经网络在各类应用中取得了显著进展,但其运行仍需大量计算与存储资源。网络量化作为一种高效的神经网络压缩技术,可推动人工智能部署的规模化与高效化。近年来,重参数化技术因其能增强模型性能并同时降低计算负担,在诸多计算机视觉任务中展现出应用潜力。然而,对重参数化网络进行量化时会出现显著的精度下降问题。我们研究发现,主要挑战源于原始分支间权重分布的较大差异。为解决该问题,我们提出了一种粗粒度与细粒度权重拆分方法(CFWS)以降低权重量化误差,并通过改进KL散度指标来确定激活函数的最优量化尺度。据我们所知,本方法首次实现了重参数化网络的后训练量化。以量化后的RepVGG-A1模型为例,其精度损失仅为0.3%。相关代码已开源至 https://github.com/NeonHo/Coarse-Fine-Weight-Split.git