Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories emerge within KD methods: feature-based, focusing on intermediate layers' features, and logits-based, targeting the final layer's logits. This paper introduces a novel perspective by leveraging diverse knowledge sources within a unified KD framework. Specifically, we aggregate features from intermediate layers into a comprehensive representation, effectively gathering semantic information from different stages and scales. Subsequently, we predict the distribution parameters from this representation. These steps transform knowledge from the intermediate layers into corresponding distributive forms, thereby allowing for knowledge distillation through a unified distribution constraint at different stages of the network, ensuring the comprehensiveness and coherence of knowledge transfer. Numerous experiments were conducted to validate the effectiveness of the proposed method.
翻译:知识蒸馏(KD)以其能够在不改变架构的情况下,将知识从复杂网络(教师)迁移到轻量网络(学生)的能力而受到越来越多的关注。KD方法主要分为两大类:基于特征的方法,侧重于中间层的特征;以及基于逻辑值的方法,针对最终层的逻辑值。本文通过在一个统一的KD框架内利用多样化的知识源,引入了一种新颖的视角。具体而言,我们将中间层的特征聚合为一个综合表示,从而有效地收集来自不同阶段和尺度的语义信息。随后,我们从这个表示中预测分布参数。这些步骤将来自中间层的知识转化为相应的分布形式,从而允许在网络的不同阶段通过统一的分布约束进行知识蒸馏,确保了知识迁移的全面性和连贯性。我们进行了大量实验以验证所提方法的有效性。