Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to compact student models. However, current KD methods for super-resolution (SR) networks overlook the nature of SR task that the outputs of the teacher model are noisy approximations to the ground-truth distribution of high-quality images (GT), which shades the teacher model's knowledge to result in limited KD effects. To utilize the teacher model beyond the GT upper-bound, we present the Data Upcycling Knowledge Distillation (DUKD), to transfer the teacher model's knowledge to the student model through the upcycled in-domain data derived from training data. Besides, we impose label consistency regularization to KD for SR by the paired invertible augmentations to improve the student model's performance and robustness. Comprehensive experiments demonstrate that the DUKD method significantly outperforms previous arts on several SR tasks.
翻译:知识蒸馏通过将繁琐的预训练教师模型中与任务相关的知识迁移至紧凑的学生模型,从而压缩深度神经网络。然而,当前面向超分辨率网络的知识蒸馏方法忽略了超分辨率任务的本质——教师模型的输出是对高质量图像真实分布的有噪近似,这遮蔽了教师模型的知识,导致蒸馏效果有限。为突破真实图像质量上界对教师模型利用的限制,我们提出数据循环利用知识蒸馏,通过从训练数据中衍生的循环利用域内数据,将教师模型的知识迁移至学生模型。此外,我们通过配对可逆增广对超分辨率任务中的知识蒸馏施加标签一致性正则化,以提升学生模型的性能与鲁棒性。综合实验表明,DUKD方法在多个超分辨率任务上显著优于现有技术。