Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.
翻译:知识蒸馏是一种通过模仿将大型教师模型的知识迁移至小型学生模型的流行技术。然而,通过直接对齐教师与学生模型的特征图进行蒸馏,可能对学生模型施加过于严格的约束,从而降低其性能。为缓解上述特征对齐问题,现有工作主要聚焦于通过像素级变换实现教师与学生特征图的空间对齐。本文新发现,沿通道维度对齐师生特征图同样能有效解决特征对齐问题。具体而言,我们提出了一种可学习的非线性通道变换,用于对齐学生与教师模型的特征。基于此,我们进一步提出一个简单且通用的特征蒸馏框架,仅需单个超参数即可平衡蒸馏损失与任务特定损失。大量实验结果表明,该方法在多个计算机视觉任务中均取得显著性能提升:图像分类任务(MobileNetV1在ImageNet-1K上Top-1准确率提升+3.28%)、目标检测任务(基于ResNet50的Faster-RCNN在MS COCO上bbox mAP提升+3.9%)、实例分割任务(基于ResNet50的Mask-RCNN在MS COCO上Mask mAP提升+2.8%)以及语义分割任务(基于ResNet18的PSPNet在Cityscapes上mIoU提升+4.66%),充分证明了所提方法的有效性与通用性。相关代码将开源发布。