In this paper, we propose a new distillation method for extracting knowledge from Large Foundation Models (LFM) into lightweight models, introducing a novel supervision mode that does not require manually annotated data. While LFMs exhibit exceptional zero-shot classification abilities across datasets, relying solely on LFM-generated embeddings for distillation poses two main challenges: LFM's task-irrelevant knowledge and the high density of features. The transfer of task-irrelevant knowledge could compromise the student model's discriminative capabilities, and the high density of features within target domains obstructs the extraction of discriminative knowledge essential for the task. To address this issue, we introduce the Proxy Relational Graph (PRG) method. We initially extract task-relevant knowledge from LFMs by calculating a weighted average of logits obtained through text prompt embeddings. Then we construct sample-class proxy graphs for LFM and student models, respectively, to model the correlation between samples and class proxies. Then, we achieve the distillation of selective knowledge by aligning the relational graphs produced by both the LFM and the student model. Specifically, the distillation from LFM to the student model is achieved through two types of alignment: 1) aligning the sample nodes produced by the student model with those produced by the LFM, and 2) aligning the edge relationships in the student model's graph with those in the LFM's graph. Our experimental results validate the effectiveness of PRG, demonstrating its ability to leverage the extensive knowledge base of LFMs while skillfully circumventing their inherent limitations in focused learning scenarios. Notably, in our annotation-free framework, PRG achieves an accuracy of 76.23\% (T: 77.9\%) on CIFAR-100 and 72.44\% (T: 75.3\%) on the ImageNet-1K.
翻译:本文提出了一种新的知识蒸馏方法,用于将大型基础模型的知识迁移至轻量级模型,引入了一种无需人工标注数据的监督范式。尽管大型基础模型在跨数据集零样本分类任务中展现出卓越性能,但仅依赖其生成的嵌入表示进行蒸馏面临两大挑战:大型基础模型中存在的任务无关知识,以及特征空间的高密度特性。任务无关知识的迁移可能损害学生模型的判别能力,而目标域内特征的高密度性则阻碍了对任务关键判别性知识的提取。为解决此问题,我们提出了代理关系图方法。我们首先通过计算文本提示嵌入所得逻辑值的加权平均,从大型基础模型中提取任务相关知识。随后,我们分别为大型基础模型和学生模型构建样本-类别代理图,以建模样本与类别代理之间的关联关系。通过对齐大型基础模型与学生模型生成的关系图,我们实现了选择性知识的蒸馏。具体而言,从大型基础模型到学生模型的知识蒸馏通过两种对齐方式实现:1)对齐学生模型与大型基础模型生成的样本节点;2)对齐学生模型关系图中的边关系与大型基础模型关系图中的对应关系。实验结果验证了PRG方法的有效性,表明其能够充分利用大型基础模型的广泛知识库,同时在聚焦学习场景中巧妙规避其固有局限性。值得注意的是,在我们提出的无标注框架中,PRG在CIFAR-100数据集上达到了76.23%的准确率(教师模型:77.9%),在ImageNet-1K数据集上达到了72.44%的准确率(教师模型:75.3%)。