In this paper, we propose a feature affinity (FA) assisted knowledge distillation (KD) method to improve quantization-aware training of deep neural networks (DNN). The FA loss on intermediate feature maps of DNNs plays the role of teaching middle steps of a solution to a student instead of only giving final answers in the conventional KD where the loss acts on the network logits at the output level. Combining logit loss and FA loss, we found that the quantized student network receives stronger supervision than from the labeled ground-truth data. The resulting FAQD is capable of compressing model on label-free data, which brings immediate practical benefits as pre-trained teacher models are readily available and unlabeled data are abundant. In contrast, data labeling is often laborious and expensive. Finally, we propose a fast feature affinity (FFA) loss that accurately approximates FA loss with a lower order of computational complexity, which helps speed up training for high resolution image input.
翻译:本文提出了一种特征亲和力(FA)辅助的知识蒸馏(KD)方法,用于改进深度神经网络(DNN)的量化感知训练。传统KD方法仅在输出层通过网络logits损失提供最终答案,而本文提出的DNN中间特征图FA损失则承担了向学生网络传授解题中间步骤的任务。通过结合logit损失与FA损失,我们发现量化后的学生网络能比使用带标签的真实数据获得更强的监督信号。由此产生的FAQD方法能够在无标签数据上实现模型压缩,这具有直接的实用价值——预训练教师模型易于获取,且未标注数据资源丰富,而数据标注通常既费时又昂贵。最后,我们提出了一种快速特征亲和力(FFA)损失,该损失能以更低计算复杂度精确近似FA损失,有助于加速高分辨率图像输入的训练过程。