CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective

In this paper, we present a simple yet effective contrastive knowledge distillation approach, which can be formulated as a sample-wise alignment problem with intra- and inter-sample constraints. Unlike traditional knowledge distillation methods that concentrate on maximizing feature similarities or preserving class-wise semantic correlations between teacher and student features, our method attempts to recover the "dark knowledge" by aligning sample-wise teacher and student logits. Specifically, our method first minimizes logit differences within the same sample by considering their numerical values, thus preserving intra-sample similarities. Next, we bridge semantic disparities by leveraging dissimilarities across different samples. Note that constraints on intra-sample similarities and inter-sample dissimilarities can be efficiently and effectively reformulated into a contrastive learning framework with newly designed positive and negative pairs. The positive pair consists of the teacher's and student's logits derived from an identical sample, while the negative pairs are formed by using logits from different samples. With this formulation, our method benefits from the simplicity and efficiency of contrastive learning through the optimization of InfoNCE, yielding a run-time complexity that is far less than $O(n^2)$, where $n$ represents the total number of training samples. Furthermore, our method can eliminate the need for hyperparameter tuning, particularly related to temperature parameters and large batch sizes. We conduct comprehensive experiments on three datasets including CIFAR-100, ImageNet-1K, and MS COCO. Experimental results clearly confirm the effectiveness of the proposed method on both image classification and object detection tasks. Our source codes will be publicly available at https://github.com/wencheng-zhu/CKD.

翻译：本文提出一种简单而有效的对比知识蒸馏方法，该方法可形式化为具有样本内与样本间约束的样本级对齐问题。与传统知识蒸馏方法侧重于最大化教师与学生特征之间的相似性或保持类别间语义相关性不同，我们的方法试图通过对齐教师和学生逐样本logits来恢复“暗知识”。具体而言，该方法首先通过比较同一样本内logits的数值差异来最小化其差异，从而保留样本内相似性。接着，利用不同样本间的差异性来弥合语义差异。值得注意的是，样本内相似性约束与样本间差异性约束可高效地重构为一种对比学习框架，并设计新的正负样本对。正样本对由同一样本的教师和学生logits构成，负样本对则利用不同样本的logits形成。通过这种形式化，我们的方法受益于对比学习的简洁性与高效性（通过优化InfoNCE），其运行时复杂度远低于$O(n^2)$（其中$n$为训练样本总数）。此外，该方法可消除对超参数调优（尤其是温度参数和大批量大小）的需求。我们在CIFAR-100、ImageNet-1K和MS COCO三个数据集上进行了全面实验，结果明确证实了所提方法在图像分类和目标检测任务中的有效性。源代码将公开于https://github.com/wencheng-zhu/CKD。