Knowledge distillation has emerged as a highly effective method for bridging the representation discrepancy between large-scale models and lightweight models. Prevalent approaches involve leveraging appropriate metrics to minimize the divergence or distance between the knowledge extracted from the teacher model and the knowledge learned by the student model. Centered Kernel Alignment (CKA) is widely used to measure representation similarity and has been applied in several knowledge distillation methods. However, these methods are complex and fail to uncover the essence of CKA, thus not answering the question of how to use CKA to achieve simple and effective distillation properly. This paper first provides a theoretical perspective to illustrate the effectiveness of CKA, which decouples CKA to the upper bound of Maximum Mean Discrepancy~(MMD) and a constant term. Drawing from this, we propose a novel Relation-Centered Kernel Alignment~(RCKA) framework, which practically establishes a connection between CKA and MMD. Furthermore, we dynamically customize the application of CKA based on the characteristics of each task, with less computational source yet comparable performance than the previous methods. The extensive experiments on the CIFAR-100, ImageNet-1k, and MS-COCO demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs for image classification and object detection, validating the effectiveness of our approaches. Our code is available in https://github.com/Klayand/PCKA
翻译:知识蒸馏已成为弥合大规模模型与轻量级模型之间表征差异的高效方法。主流方法通常借助合适的度量指标,最小化教师模型提取的知识与学生模型所学知识之间的差异或距离。中心化核对齐(CKA)被广泛用于衡量表征相似性,并已被应用于多种知识蒸馏方法中。然而,这些方法过于复杂,未能揭示CKA的本质,从而无法回答如何正确利用CKA实现简单高效的蒸馏。本文首先从理论视角阐释了CKA的有效性,将其分解为最大均值差异(MMD)的上界与常数项。基于此,我们提出一种新颖的关系中心化核对齐(RCKA)框架,该框架实际建立了CKA与MMD之间的关联。此外,我们根据每项任务的特征动态定制CKA的应用方式,在计算资源消耗更低的同时,性能与先前方法相当。在CIFAR-100、ImageNet-1k和MS-COCO上的大量实验表明,本方法在图像分类与目标检测任务中几乎所有师生模型组合上均达到了最先进的性能,验证了所提方法的有效性。我们的代码开源在 https://github.com/Klayand/PCKA