Knowledge Distillation (KD) has proven effective for compressing large teacher models into smaller student models. While it is well known that student models can achieve similar accuracies as the teachers, it has also been shown that they nonetheless often do not learn the same function. It is, however, often highly desirable that the student's and teacher's functions share similar properties such as basing the prediction on the same input features, as this ensures that students learn the 'right features' from the teachers. In this work, we explore whether this can be achieved by not only optimizing the classic KD loss but also the similarity of the explanations generated by the teacher and the student. Despite the idea being simple and intuitive, we find that our proposed 'explanation-enhanced' KD (e$^2$KD) (1) consistently provides large gains in terms of accuracy and student-teacher agreement, (2) ensures that the student learns from the teacher to be right for the right reasons and to give similar explanations, and (3) is robust with respect to the model architectures, the amount of training data, and even works with 'approximate', pre-computed explanations.
翻译:知识蒸馏(KD)已被证明能有效将大型教师模型压缩为较小的学生模型。尽管学生模型能达到与教师模型相当的准确率,但研究也表明它们往往并未学习到相同的函数。然而,通常强烈希望学生与教师的函数具有相似的属性,例如基于相同的输入特征进行预测,因为这能确保学生从教师处学到"正确的特征"。在本研究中,我们探讨能否通过不仅优化经典的KD损失函数,还同时优化教师与学生生成的解释的相似性来实现这一目标。尽管该思想简单直观,但我们发现所提出的"解释增强"知识蒸馏(e$^2$KD)能够:(1)在准确率及师生一致性方面持续带来显著提升;(2)确保学生从教师处学习正确的决策依据并产生相似的解释;(3)对模型架构、训练数据量具有鲁棒性,甚至在使用"近似"预计算解释时依然有效。