Distilling from the feature maps can be fairly effective for dense prediction tasks since both the feature discriminability and localization priors can be well transferred. However, not every pixel contributes equally to the performance, and a good student should learn from what really matters to the teacher. In this paper, we introduce a learnable embedding dubbed receptive token to localize those pixels of interests (PoIs) in the feature map, with a distillation mask generated via pixel-wise attention. Then the distillation will be performed on the mask via pixel-wise reconstruction. In this way, a distillation mask actually indicates a pattern of pixel dependencies within feature maps of teacher. We thus adopt multiple receptive tokens to investigate more sophisticated and informative pixel dependencies to further enhance the distillation. To obtain a group of masks, the receptive tokens are learned via the regular task loss but with teacher fixed, and we also leverage a Dice loss to enrich the diversity of learned masks. Our method dubbed MasKD is simple and practical, and needs no priors of tasks in application. Experiments show that our MasKD can achieve state-of-the-art performance consistently on object detection and semantic segmentation benchmarks. Code is available at: https://github.com/hunto/MasKD .
翻译:[translated abstract in Chinese]
从特征图中进行蒸馏对于密集预测任务相当有效,因为特征判别能力与定位先验知识均可得到良好迁移。然而,并非每个像素对性能的贡献相同,优秀的学生应当从真正影响教师的重要因素中学习。本文提出一种名为"感受野令牌"的可学习嵌入,用于定位特征图中的兴趣像素(PoIs),并通过像素级注意力生成蒸馏掩码。随后,蒸馏将通过像素级重建作用于该掩码。通过这种方式,蒸馏掩码实际上揭示了教师特征图中像素依赖关系的模式。为此,我们采用多个感受野令牌来探究更复杂且信息量更丰富的像素依赖关系,从而进一步增强蒸馏效果。为获取多组掩码,感受野令牌通过常规任务损失函数进行学习(保持教师模型固定),同时引入Dice损失以增强所学掩码的多样性。所提方法MasKD简洁实用,无需应用场景的任务先验知识。实验表明,MasKD在目标检测与语义分割基准测试中持续达到最优性能。代码开源地址:https://github.com/hunto/MasKD 。