Knowledge distillation is the process of transferring knowledge from a more powerful large model (teacher) to a simpler counterpart (student). Numerous current approaches involve the student imitating the knowledge of the teacher directly. However, redundancy still exists in the learned representations through these prevalent methods, which tend to learn each spatial location's features indiscriminately. To derive a more compact representation (concept feature) from the teacher, inspired by human cognition, we suggest an innovative method, termed Generative Denoise Distillation (GDD), where stochastic noises are added to the concept feature of the student to embed them into the generated instance feature from a shallow network. Then, the generated instance feature is aligned with the knowledge of the instance from the teacher. We extensively experiment with object detection, instance segmentation, and semantic segmentation to demonstrate the versatility and effectiveness of our method. Notably, GDD achieves new state-of-the-art performance in the tasks mentioned above. We have achieved substantial improvements in semantic segmentation by enhancing PspNet and DeepLabV3, both of which are based on ResNet-18, resulting in mIoU scores of 74.67 and 77.69, respectively, surpassing their previous scores of 69.85 and 73.20 on the Cityscapes dataset of 20 categories. The source code is available at https://github.com/ZhgLiu/GDD.
翻译:知识蒸馏是将知识从一个更强大的大型模型(教师)迁移到较简单的对应模型(学生)的过程。当前众多方法要求学生直接模仿教师的知识,然而这些主流方法往往不加区分地学习每个空间位置的特征,导致所学表征仍存在冗余。为从教师模型中提取更紧凑的表征(概念特征),受人类认知启发,我们提出一种创新方法——生成式去噪蒸馏(GDD):向学生的概念特征添加随机噪声,将其嵌入浅层网络生成的实例特征中,随后将生成的实例特征与教师所蕴含的实例知识对齐。我们针对目标检测、实例分割和语义分割任务开展了大量实验,以验证该方法的通用性与有效性。值得注意的是,GDD在以上任务中均取得了新的最佳性能。通过增强基于ResNet-18的PspNet和DeepLabV3模型,我们在语义分割任务上实现显著提升,在Cityscapes数据集的20类任务中,mIoU评分分别达到74.67和77.69,超越此前69.85和73.20的基线成绩。源代码已开源至https://github.com/ZhgLiu/GDD。