Knowledge distillation has been widely adopted in a variety of tasks and has achieved remarkable successes. Since its inception, many researchers have been intrigued by the dark knowledge hidden in the outputs of the teacher model. Recently, a study has demonstrated that knowledge distillation and label smoothing can be unified as learning from soft labels. Consequently, how to measure the effectiveness of the soft labels becomes an important question. Most existing theories have stringent constraints on the teacher model or data distribution, and many assumptions imply that the soft labels are close to the ground-truth labels. This paper studies whether biased soft labels are still effective. We present two more comprehensive indicators to measure the effectiveness of such soft labels. Based on the two indicators, we give sufficient conditions to ensure biased soft label based learners are classifier-consistent and ERM learnable. The theory is applied to three weakly-supervised frameworks. Experimental results validate that biased soft labels can also teach good students, which corroborates the soundness of the theory.
翻译:知识蒸馏已被广泛应用于各类任务并取得了显著成功。自其诞生以来,许多研究者一直对教师模型输出中隐藏的暗知识深感兴趣。近期有研究表明,知识蒸馏与标签平滑可统一为从软标签中学习。因此,如何衡量软标签的有效性成为重要问题。现有理论大多对教师模型或数据分布施加严格约束,且诸多假设暗含软标签接近真实标签的前提。本文研究有偏软标签是否依然有效,提出两个更全面的指标来衡量此类软标签的有效性。基于这两个指标,我们给出充分条件以确保基于有偏软标签的学习器具有分类一致性和ERM可学习性。该理论被应用于三个弱监督框架。实验结果表明,有偏软标签同样能训练出优秀的学生模型,这佐证了理论的合理性。