Existing knowledge distillation methods generally use a teacher-student approach, where the student network solely learns from a well-trained teacher. However, this approach overlooks the inherent differences in learning abilities between the teacher and student networks, thus causing the capacity-gap problem. To address this limitation, we propose a novel method called SLKD.
翻译:现有的知识蒸馏方法通常采用教师-学生范式,其中学生网络仅从训练良好的教师网络学习。然而,这种方法忽视了教师网络与学生网络之间固有的学习能力差异,从而导致能力差距问题。为解决这一局限,我们提出一种名为SLKD的新颖方法。