通用化教师网络：实现跨学生架构的高效知识蒸馏 (Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures)

Knowledge distillation (KD) is a model compression method that entails training a compact student model to emulate the performance of a more complex teacher model. However, the architectural capacity gap between the two models limits the effectiveness of knowledge transfer. Addressing this issue, previous works focused on customizing teacher-student pairs to improve compatibility, a computationally expensive process that needs to be repeated every time either model changes. Hence, these methods are impractical when a teacher model has to be compressed into different student models for deployment on multiple hardware devices with distinct resource constraints. In this work, we propose Generic Teacher Network (GTN), a one-off KD-aware training to create a generic teacher capable of effectively transferring knowledge to any student model sampled from a given finite pool of architectures. To this end, we represent the student pool as a weight-sharing supernet and condition our generic teacher to align with the capacities of various student architectures sampled from this supernet. Experimental evaluation shows that our method both improves overall KD effectiveness and amortizes the minimal additional training cost of the generic teacher across students in the pool.

翻译：知识蒸馏（KD）是一种模型压缩方法，旨在训练紧凑的学生模型以模仿更复杂的教师模型的性能。然而，两种模型之间的架构容量差距限制了知识转移的有效性。针对这一问题，先前的研究集中于定制教师-学生配对以提高兼容性，这一计算成本高昂的过程需要在任一模型发生变化时重复进行。因此，当教师模型需要被压缩为不同的学生模型以部署在具有不同资源约束的多种硬件设备上时，这些方法并不实用。在本工作中，我们提出通用教师网络（GTN），这是一种一次性KD感知训练方法，用于创建一个能够有效将知识转移至从给定有限架构池中采样的任何学生模型的通用教师。为此，我们将学生池表示为一个权重共享的超网，并调整我们的通用教师以与从该超网采样的各种学生架构的容量对齐。实验评估表明，我们的方法既提高了整体KD的有效性，又能在池中学生间分摊通用教师的最小额外训练成本。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日