The teacher-free online Knowledge Distillation (KD) aims to train an ensemble of multiple student models collaboratively and distill knowledge from each other. Although existing online KD methods achieve desirable performance, they often focus on class probabilities as the core knowledge type, ignoring the valuable feature representational information. We present a Mutual Contrastive Learning (MCL) framework for online KD. The core idea of MCL is to perform mutual interaction and transfer of contrastive distributions among a cohort of networks in an online manner. Our MCL can aggregate cross-network embedding information and maximize the lower bound to the mutual information between two networks. This enables each network to learn extra contrastive knowledge from others, leading to better feature representations, thus improving the performance of visual recognition tasks. Beyond the final layer, we extend MCL to intermediate layers and perform an adaptive layer-matching mechanism trained by meta-optimization. Experiments on image classification and transfer learning to visual recognition tasks show that layer-wise MCL can lead to consistent performance gains against state-of-the-art online KD approaches. The superiority demonstrates that layer-wise MCL can guide the network to generate better feature representations. Our code is publicly avaliable at https://github.com/winycg/L-MCL.
翻译:无教师在线知识蒸馏(KD)旨在通过多个学生模型的协作训练实现相互知识迁移。现有在线KD方法虽取得理想性能,但通常将类别概率作为核心知识类型,忽略了有价值的特征表征信息。本文提出面向在线KD的互对比学习(MCL)框架,其核心思想是在网络群组间以在线方式实现对比分布的交互与传递。MCL能够聚合跨网络的嵌入信息,通过最大化互信息下界来增强网络间的信息关联,使每个网络能从其他网络学习额外的对比知识,从而获得更优的特征表征,提升视觉识别任务性能。除最终层外,我们将MCL扩展至中间层,并设计基于元优化的自适应层匹配机制。在图像分类及视觉识别迁移学习任务上的实验表明,分层MCL相比现有最优在线KD方法能持续提升性能。这一优势证明分层MCL可有效引导网络生成更优的特征表征。我们的代码已开源在https://github.com/winycg/L-MCL。