Knowledge distillation (KD) has shown potential for learning compact models in dense object detection. However, the commonly used softmax-based distillation ignores the absolute classification scores for individual categories. Thus, the optimum of the distillation loss does not necessarily lead to the optimal student classification scores for dense object detectors. This cross-task protocol inconsistency is critical, especially for dense object detectors, since the foreground categories are extremely imbalanced. To address the issue of protocol differences between distillation and classification, we propose a novel distillation method with cross-task consistent protocols, tailored for the dense object detection. For classification distillation, we address the cross-task protocol inconsistency problem by formulating the classification logit maps in both teacher and student models as multiple binary-classification maps and applying a binary-classification distillation loss to each map. For localization distillation, we design an IoU-based Localization Distillation Loss that is free from specific network structures and can be compared with existing localization distillation losses. Our proposed method is simple but effective, and experimental results demonstrate its superiority over existing methods. Code is available at https://github.com/TinyTigerPan/BCKD.
翻译:知识蒸馏在密集目标检测中展现出压缩模型的潜力。然而,常用的基于softmax的蒸馏忽略了各个类别的绝对分类得分,导致蒸馏损失的最优解未必使密集目标检测器的学生分类得分达到最优。这种跨任务协议不一致性问题尤为关键,尤其对于密集目标检测器而言,其前景类别存在极度不平衡。为解决蒸馏与分类之间的协议差异问题,我们提出了一种面向密集目标检测的跨任务一致性协议蒸馏方法。在分类蒸馏中,我们通过将教师模型和学生模型的分类logit图分别建模为多个二分类图,并对每个图施加二分类蒸馏损失,从而解决跨任务协议不一致性问题。在定位蒸馏中,我们设计了一种基于IoU的定位蒸馏损失,该损失不受特定网络结构限制,且可与现有定位蒸馏损失进行对比。所提方法简洁有效,实验结果表明其优于现有方法。代码可在https://github.com/TinyTigerPan/BCKD获取。