In recent years, knowledge distillation (KD) has been widely used to derive efficient models. Through imitating a large teacher model, a lightweight student model can achieve comparable performance with more efficiency. However, most existing knowledge distillation methods are focused on classification tasks. Only a limited number of studies have applied knowledge distillation to object detection, especially in time-sensitive autonomous driving scenarios. In this paper, we propose Adaptive Instance Distillation (AID) to selectively impart teacher's knowledge to the student to improve the performance of knowledge distillation. Unlike previous KD methods that treat all instances equally, our AID can attentively adjust the distillation weights of instances based on the teacher model's prediction loss. We verified the effectiveness of our AID method through experiments on the KITTI and the COCO traffic datasets. The results show that our method improves the performance of state-of-the-art attention-guided and non-local distillation methods and achieves better distillation results on both single-stage and two-stage detectors. Compared to the baseline, our AID led to an average of 2.7% and 2.1% mAP increases for single-stage and two-stage detectors, respectively. Furthermore, our AID is also shown to be useful for self-distillation to improve the teacher model's performance.
翻译:近年来,知识蒸馏(KD)被广泛用于获取高效模型。通过模仿大型教师模型,轻量级学生模型能够在保持更高效率的同时实现可比的性能。然而,现有知识蒸馏方法大多聚焦于分类任务,仅有少量研究将知识蒸馏应用于目标检测,尤其是在时间敏感的自动驾驶场景中。本文提出自适应实例蒸馏(AID),旨在选择性向学生模型传递教师知识以提升知识蒸馏性能。与以往将所有实例等同对待的KD方法不同,本方法可根据教师模型的预测损失,自适应调整各实例的蒸馏权重。通过在KITTI和COCO交通数据集上的实验,我们验证了AID方法的有效性。结果表明,本方法能提升现有注意力引导与非局部蒸馏方法的性能,并在单阶段和两阶段检测器上均取得更优蒸馏效果。相较于基线模型,AID在单阶段和两阶段检测器上分别平均提升2.7%和2.1%的mAP。此外,AID还被证实可用于自蒸馏以提升教师模型性能。