Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training strategies with a single teacher and simple loss functions. In this paper, we propose a novel Multi-Teacher Knowledge Distillation (MTKD) framework specifically for image super-resolution. It exploits the advantages of multiple teachers by combining and enhancing the outputs of these teacher models, which then guides the learning process of the compact student network. To achieve more effective learning performance, we have also developed a new wavelet-based loss function for MTKD, which can better optimize the training process by observing differences in both the spatial and frequency domains. We fully evaluate the effectiveness of the proposed method by comparing it to five commonly used KD methods for image super-resolution based on three popular network architectures. The results show that the proposed MTKD method achieves evident improvements in super-resolution performance, up to 0.46dB (based on PSNR), over state-of-the-art KD approaches across different network structures. The source code of MTKD will be made available here for public evaluation.
翻译:知识蒸馏(KD)已成为深度学习领域中一种极具前景的技术,通常用于通过学习高性能但更复杂的教师网络来增强轻量化学生网络。当将该技术应用于图像超分辨率时,大多数知识蒸馏方法都是从为其他计算机视觉任务开发的策略改编而来,这些策略基于单教师训练机制与简单损失函数。本文提出了一种专门针对图像超分辨率的新型多教师知识蒸馏(MTKD)框架。该框架通过整合并增强多个教师模型的输出来发挥其协同优势,进而指导轻量化学生网络的学习过程。为获得更高效的学习性能,我们还为MTKD开发了一种新型基于小波的损失函数,该函数通过同时观测空间域与频域差异来更优地优化训练过程。我们基于三种主流网络架构,将所提方法与五种广泛使用的图像超分辨率知识蒸馏方法进行全面对比评估。结果表明,在不同网络结构下,所提出的MTKD方法相较于现有最优知识蒸馏方法,在超分辨率性能上取得了显著提升(基于PSNR指标最高提升0.46dB)。MTKD的源代码将公开供同行评估。