Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. Since the teacher model perceives data in a way different from humans, existing KD methods only distill knowledge that is consistent with labels annotated by human expert while neglecting knowledge that is not consistent with human perception, which results in insufficient distillation and sub-optimal performance. In this paper, we propose inconsistent knowledge distillation (IKD), which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. We start by considering the teacher model's counter-intuitive perceptions of frequency and non-robust features. Unlike previous works that exploit fine-grained features or introduce additional regularizations, we extract inconsistent knowledge by providing diverse input using data augmentation. Specifically, we propose a sample-specific data augmentation to transfer the teacher model's ability in capturing distinct frequency components and suggest an adversarial feature augmentation to extract the teacher model's perceptions of non-robust features in the data. Extensive experiments demonstrate the effectiveness of our method which outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors (at most +1.0 mAP). Our codes will be made available at \url{https://github.com/JWLiang007/IKD.git}.
翻译:知识蒸馏(KD)在目标检测中旨在通过从教师模型迁移知识来训练紧凑型检测器。由于教师模型感知数据的方式与人类不同,现有KD方法仅蒸馏与人类专家标注标签一致的知识,而忽略了与人类感知不一致的知识,导致蒸馏不充分且性能次优。本文提出不一致知识蒸馏(IKD),旨在蒸馏教师模型反直觉感知中蕴含的知识。我们首先考虑教师模型对频率特征和非鲁棒特征的反直觉感知。与以往利用细粒度特征或引入额外正则化的方法不同,我们通过数据增强提供多样化输入来提取不一致知识。具体地,我们提出样本级数据增强以迁移教师模型捕获不同频率成分的能力,并建议对抗性特征增强以提取教师模型对数据中非鲁棒特征的感知。大量实验证明,该方法在一阶段、两阶段和无锚框目标检测器上均优于最先进的KD基线(最高提升+1.0 mAP)。代码将在\url{https://github.com/JWLiang007/IKD.git}公开。