Generic-to-Specific Distillation of Masked Autoencoders

Large vision Transformers (ViTs) driven by self-supervised pre-training mechanisms achieved unprecedented progress. Lightweight ViT models limited by the model capacity, however, benefit little from those pre-training mechanisms. Knowledge distillation defines a paradigm to transfer representations from large (teacher) models to small (student) ones. However, the conventional single-stage distillation easily gets stuck on task-specific transfer, failing to retain the task-agnostic knowledge crucial for model generalization. In this study, we propose generic-to-specific distillation (G2SD), to tap the potential of small ViT models under the supervision of large models pre-trained by masked autoencoders. In generic distillation, decoder of the small model is encouraged to align feature predictions with hidden representations of the large model, so that task-agnostic knowledge can be transferred. In specific distillation, predictions of the small model are constrained to be consistent with those of the large model, to transfer task-specific features which guarantee task performance. With G2SD, the vanilla ViT-Small model respectively achieves 98.7%, 98.1% and 99.3% the performance of its teacher (ViT-Base) for image classification, object detection, and semantic segmentation, setting a solid baseline for two-stage vision distillation. Code will be available at https://github.com/pengzhiliang/G2SD.

翻译：受自监督预训练机制驱动的大型视觉Transformer（ViT）取得了前所未有的进展。然而，受限于模型容量的小型ViT模型却难以从这些预训练机制中充分获益。知识蒸馏提供了一种将表示从大型（教师）模型迁移到小型（学生）模型的范式。然而，传统的单阶段蒸馏容易陷入任务特定迁移的困境，无法保留对模型泛化至关重要的任务无关知识。在本研究中，我们提出通用到特定蒸馏（G2SD），以挖掘在掩码自编码器预训练的大型模型监督下的小型ViT模型的潜力。在通用蒸馏阶段，我们鼓励小型模型的解码器将其特征预测与大型模型的隐藏表示对齐，从而迁移任务无关的知识。在特定蒸馏阶段，我们约束小型模型的预测与大型模型的预测保持一致，以迁移保证任务性能的任务特定特征。采用G2SD方法，原始ViT-Small模型在图像分类、目标检测和语义分割任务上分别达到了其教师模型（ViT-Base）性能的98.7%、98.1%和99.3%，为两阶段视觉蒸馏建立了坚实的基线。代码将在https://github.com/pengzhiliang/G2SD 提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日