Knowledge distillation (KD) aims to transfer the knowledge of a more capable yet cumbersome teacher model to a lightweight student model. In recent years, relation-based KD methods have fallen behind, as their instance-matching counterparts dominate in performance. In this paper, we revive relational KD by identifying and tackling several key issues in relation-based methods, including their susceptibility to overfitting and spurious responses. Specifically, we transfer novelly constructed affinity graphs that compactly encapsulate a wealth of beneficial inter-sample, inter-class, and inter-view correlations by exploiting virtual views and relations as a new kind of knowledge. As a result, the student has access to richer guidance signals and stronger regularisation throughout the distillation process. To further mitigate the adverse impact of spurious responses, we prune the affinity graphs by dynamically detaching redundant and unreliable edges. Extensive experiments on CIFAR-100 and ImageNet datasets demonstrate the superior performance of the proposed virtual relation matching (VRM) method over a range of models, architectures, and set-ups. For instance, VRM for the first time hits 74.0% accuracy for ResNet50-to-MobileNetV2 distillation on ImageNet, and improves DeiT-T by 14.44% on CIFAR-100 with a ResNet56 teacher. Thorough analyses are also conducted to gauge the soundness, properties, and complexity of our designs. Code and models will be released.
翻译:知识蒸馏旨在将能力更强但更繁琐的教师模型的知识迁移到轻量级的学生模型中。近年来,基于关系的蒸馏方法在性能上已落后于以实例匹配为主导的方法。本文通过识别并解决基于关系方法中的若干关键问题(包括其对过拟合和伪响应的敏感性),重新激活了关系型知识蒸馏的研究。具体而言,我们通过利用虚拟视角和虚拟关系作为一种新型知识,迁移新颖构建的亲和力图谱,该图谱紧凑地封装了丰富的样本间、类别间和视角间有益关联。因此,学生在整个蒸馏过程中能够获得更丰富的指导信号和更强的正则化约束。为进一步减轻伪响应的负面影响,我们通过动态剥离冗余和不可靠边来剪枝亲和力图谱。在CIFAR-100和ImageNet数据集上的大量实验表明,所提出的虚拟关系匹配方法在一系列模型、架构和配置中均表现出优越性能。例如,VRM在ImageNet上首次实现ResNet50到MobileNetV2蒸馏的74.0%准确率,并在CIFAR-100上以ResNet56作为教师模型将DeiT-T性能提升14.44%。我们还进行了全面分析以评估所设计方法的合理性、特性与复杂度。代码与模型将公开发布。