As a general model compression paradigm, feature-based knowledge distillation allows the student model to learn expressive features from the teacher counterpart. In this paper, we mainly focus on designing an effective feature-distillation framework and propose a spatial-channel adaptive masked distillation (AMD) network for object detection. More specifically, in order to accurately reconstruct important feature regions, we first perform attention-guided feature masking on the feature map of the student network, such that we can identify the important features via spatially adaptive feature masking instead of random masking in the previous methods. In addition, we employ a simple and efficient module to allow the student network channel to be adaptive, improving its model capability in object perception and detection. In contrast to the previous methods, more crucial object-aware features can be reconstructed and learned from the proposed network, which is conducive to accurate object detection. The empirical experiments demonstrate the superiority of our method: with the help of our proposed distillation method, the student networks report 41.3%, 42.4%, and 42.7% mAP scores when RetinaNet, Cascade Mask-RCNN and RepPoints are respectively used as the teacher framework for object detection, which outperforms the previous state-of-the-art distillation methods including FGD and MGD.
翻译:作为通用的模型压缩范式,基于特征的知识蒸馏允许学生模型从教师模型中学习富有表达力的特征。本文主要聚焦于设计高效的特征蒸馏框架,提出面向目标检测的空间-通道自适应掩码蒸馏(AMD)网络。具体而言,为精确重建重要特征区域,我们首先在学生网络的特征图上执行注意力引导的特征掩码操作,从而通过空间自适应特征掩码(而非先前方法中的随机掩码)识别重要特征。此外,我们采用简单高效的模块实现学生网络通道自适应,提升其在目标感知与检测中的模型能力。与先前方法相比,本文提出的网络能更充分地重建和学习关键的目标感知特征,有利于实现精准目标检测。实验证明本方法的优越性:采用所提蒸馏方法后,当分别以RetinaNet、Cascade Mask-RCNN和RepPoints作为目标检测的教师框架时,学生网络分别获得41.3%、42.4%和42.7%的mAP分数,超越了包括FGD和MGD在内的现有最优蒸馏方法。