A popular track of network compression approach is Quantization aware Training (QAT), which accelerates the forward pass during the neural network training and inference. However, not much prior efforts have been made to quantize and accelerate the backward pass during training, even though that contributes around half of the training time. This can be partly attributed to the fact that errors of low-precision gradients during backward cannot be amortized by the training objective as in the QAT setting. In this work, we propose to solve this problem by incorporating the gradients into the computation graph of the next training iteration via a hypernetwork. Various experiments on CIFAR-10 dataset with different CNN network architectures demonstrate that our hypernetwork-based approach can effectively reduce the negative effect of gradient quantization noise and successfully quantizes the gradients to INT4 with only 0.64 accuracy drop for VGG-16 on CIFAR-10.
翻译:量化感知训练(QAT)是网络压缩领域的主流方法之一,可在神经网络训练和推理过程中加速前向传播。然而,尽管反向传播约占训练总时长的一半,目前鲜有研究对训练阶段的反向计算进行量化加速。部分原因在于,反向传播中低精度梯度产生的误差无法像QAT设置那样通过训练目标进行补偿。本文提出通过超网络将梯度纳入下一轮训练迭代的计算图来解决该问题。在CIFAR-10数据集上使用不同CNN网络架构的多种实验表明,基于超网络的方法能有效降低梯度量化噪声的负面影响,成功将梯度量化至INT4精度,其中VGG-16在CIFAR-10上的准确率仅下降0.64%。