Knowledge distillation (KD) emerges as a promising yet challenging technique for compressing deep neural networks, aiming to transfer extensive learning representations from proficient and computationally intensive teacher models to compact student models. However, current KD methods for super-resolution (SR) models have limited performance and restricted applications, since the characteristics of SR tasks are overlooked. In this paper, we put forth an approach from the perspective of effective data utilization, namely, the Data Upcycling Knowledge Distillation (DUKD), which facilitates the student model by the prior knowledge the teacher provided through the upcycled in-domain data derived from the input images. Besides, for the first time, we realize the label consistency regularization in KD for SR models, which is implemented by the paired invertible data augmentations. It constrains the training process of KD and leads to better generalization capability of the student model. The DUKD, due to its versatility, can be applied across a broad spectrum of teacher-student architectures (e.g., CNN and Transformer models) and SR tasks, such as single image SR, real-world SR, and SR quantization, and is in parallel with other compression techniques. Comprehensive experiments on diverse benchmarks demonstrate that the DUKD method significantly outperforms previous art.
翻译:知识蒸馏(KD)是一种有前景但具挑战性的深度神经网络压缩技术,旨在将计算密集型教师模型所具备的丰富学习表征迁移至轻量级学生模型。然而,现有面向超分辨率(SR)模型的知识蒸馏方法受限于SR任务特性被忽视的困境,性能有限且应用范围狭窄。本文从数据有效利用角度提出方法——数据循环利用知识蒸馏(DUKD),通过从输入图像中生成经教师模型优化的域内数据,为学生模型提供先验知识。此外,我们首次在SR模型的知识蒸馏中实现标签一致性正则化,该技术通过成对可逆数据增强实现,可约束知识蒸馏训练过程并提升学生模型泛化能力。由于DUKD的通用性,其可应用于广泛的师生网络架构(如CNN与Transformer模型)及多种SR任务(包括单图像超分辨率、真实场景超分辨率及超分辨率量化),并能与其他压缩技术协同使用。在多个基准数据集上的综合实验表明,DUKD方法显著优于现有技术水平。