As a general model compression paradigm, feature-based knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct important feature regions, we first perform attention-guided feature masking on the feature map of the student network, such that we can identify the important features via spatially adaptive feature masking instead of random masking in the previous methods. In addition, we employ a simple and efficient module to allow the student network channel to be adaptive, improving its model capability in object perception and detection. In contrast to the previous methods, more crucial object-aware features can be reconstructed and learned from the proposed network, which is conducive to accurate object detection. The empirical experiments demonstrate the superiority of our method: with the help of our proposed distillation method, the student networks report 41.3\%, 42.4\%, and 42.7\% mAP scores when RetinaNet, Cascade Mask-RCNN and RepPoints are respectively used as the teacher framework for object detection, which outperforms the previous state-of-the-art distillation methods including FGD and MGD.
翻译:作为一种通用的模型压缩范式,基于特征的知识蒸馏允许学生模型从教师模型中学习表征性特征。本文主要聚焦于设计一种高效的特征蒸馏框架,并提出一种面向目标检测的空间-通道自适应掩膜蒸馏网络(AMD)。具体而言,为精确重建重要特征区域,我们首先在学生网络的特征图上执行注意力引导的特征掩膜操作,从而通过空间自适应特征掩膜(而非先前方法中的随机掩膜)识别关键特征。此外,我们采用一个简洁高效的模块使学生网络通道具备自适应性,提升其在目标感知与检测中的模型能力。与先前方法相比,所提网络能够重建并学习更关键的目标感知特征,这有助于实现精确的目标检测。实验验证了本方法的优越性:采用所提蒸馏方法后,当分别使用RetinaNet、Cascade Mask-RCNN和RepPoints作为教师框架进行目标检测时,学生网络获得的mAP分数分别为41.3%、42.4%和42.7%,超越了包括FGD和MGD在内的先前最优蒸馏方法。