Quantization-aware training (QAT) is a common paradigm for network quantization, in which the training phase incorporates the simulation of the low-precision computation to optimize the quantization parameters in alignment with the task goals. However, direct training of low-precision networks generally faces two obstacles: 1. The low-precision model exhibits limited representation capabilities and cannot directly replicate full-precision calculations, which constitutes a deficiency compared to full-precision alternatives; 2. Non-ideal deviations during gradient propagation are a common consequence of employing pseudo-gradients as approximations in derived quantized functions. In this paper, we propose a general QAT framework for alleviating the aforementioned concerns by permitting the forward and backward processes of the low-precision network to be guided by the full-precision partner during training. In conjunction with the direct training of the quantization model, intermediate mixed-precision models are generated through the block-by-block replacement on the full-precision model and working simultaneously with the low-precision backbone, which enables the integration of quantized low-precision blocks into full-precision networks throughout the training phase. Consequently, each quantized block is capable of: 1. simulating full-precision representation during forward passes; 2. obtaining gradients with improved estimation during backward passes. We demonstrate that the proposed method achieves state-of-the-art results for 4-, 3-, and 2-bit quantization on ImageNet and CIFAR-10. The proposed framework provides a compatible extension for most QAT methods and only requires a concise wrapper for existing codes.
翻译:量化感知训练(QAT)是一种常见的网络量化范式,其在训练阶段通过模拟低精度计算来优化量化参数,使其与任务目标保持一致。然而,低精度网络的直接训练通常面临两个障碍:1. 低精度模型表征能力有限,无法直接复制全精度计算,这使其相较于全精度方案存在不足;2. 在推导出的量化函数中使用伪梯度作为近似值,通常会导致梯度传播过程中出现非理想偏差。本文提出了一种通用的QAT框架,通过允许低精度网络的前向与反向传播过程在训练期间由全精度伙伴网络引导,以缓解上述问题。在直接训练量化模型的同时,通过对全精度模型进行逐块替换,并与低精度主干网络协同工作,生成中间混合精度模型。这使得量化后的低精度模块能够在整个训练阶段集成到全精度网络中。因此,每个量化模块能够:1. 在前向传播过程中模拟全精度表征;2. 在反向传播过程中获得估计更优的梯度。我们证明,所提方法在ImageNet和CIFAR-10数据集上,针对4位、3位和2位量化取得了最先进的结果。该框架为大多数QAT方法提供了兼容性扩展,并且仅需对现有代码进行简洁的封装即可实现。