Online knowledge distillation (KD) has received increasing attention in recent years. However, while most existing online KD methods focus on developing complicated model structures and training strategies to improve the distillation of high-level knowledge like probability distribution, the effects of the multi-level knowledge in the online KD are greatly overlooked, especially the low-level knowledge. Thus, to provide a novel viewpoint to online KD, we propose MetaMixer, a regularization strategy that can strengthen the distillation by combining the low-level knowledge that impacts the localization capability of the networks, and high-level knowledge that focuses on the whole image. Experiments under different conditions show that MetaMixer can achieve significant performance gains over state-of-the-art methods.
翻译:在线知识蒸馏(Online KD)近年来受到越来越多的关注。然而,现有在线知识蒸馏方法大多专注于开发复杂的模型结构与训练策略,以提升对概率分布等高层知识的蒸馏效果,却忽视了在线知识蒸馏中多层次知识的作用,尤其是低层知识。为此,本文提出一种新的在线知识蒸馏视角——MetaMixer正则化策略,该策略通过融合影响网络定位能力的低层知识与关注整体图像的高层知识,从而增强蒸馏效果。在不同条件下的实验表明,MetaMixer能够获得相较于现有最优方法的显著性能提升。