Knowledge amalgamation (KA) aims to learn a compact student model to handle the joint objective from multiple teacher models that are are specialized for their own tasks respectively. Current methods focus on coarsely aligning teachers and students in the common representation space, making it difficult for the student to learn the proper decision boundaries from a set of heterogeneous teachers. Besides, the KL divergence in previous works only minimizes the probability distribution difference between teachers and the student, ignoring the intrinsic characteristics of teachers. Therefore, we propose a novel Contrastive Knowledge Amalgamation (CKA) framework, which introduces contrastive losses and an alignment loss to achieve intra-class cohesion and inter-class separation.Contrastive losses intra- and inter- models are designed to widen the distance between representations of different classes. The alignment loss is introduced to minimize the sample-level distribution differences of teacher-student models in the common representation space.Furthermore, the student learns heterogeneous unsupervised classification tasks through soft targets efficiently and flexibly in the task-level amalgamation. Extensive experiments on benchmarks demonstrate the generalization capability of CKA in the amalgamation of specific task as well as multiple tasks. Comprehensive ablation studies provide a further insight into our CKA.
翻译:知识融合(KA)旨在学习一个紧凑的学生模型,以处理多个分别专精于各自任务的教师模型的联合目标。现有方法通常侧重于在公共表示空间中对教师和学生进行粗略对齐,这使得学生难以从一组异构教师中学习到合适的决策边界。此外,先前工作中的KL散度仅最小化教师与学生之间的概率分布差异,忽略了教师的内在特征。因此,我们提出了一种新颖的对比知识融合(CKA)框架,该框架引入对比损失和对齐损失以实现类内凝聚和类间分离。设计了模型内和模型间的对比损失,以扩大不同类别表示之间的距离。引入对齐损失以最小化教师-学生模型在公共表示空间中的样本级分布差异。此外,学生在任务级融合中通过软目标高效且灵活地学习异构无监督分类任务。在基准数据集上的大量实验证明了CKA在特定任务以及多任务融合中的泛化能力。全面的消融研究进一步揭示了我们提出的CKA的内在特性。