Diffusion models for super-resolution (SR) produce high-quality visual results but require expensive computational costs. Despite the development of several methods to accelerate diffusion-based SR models, some (e.g., SinSR) fail to produce realistic perceptual details, while others (e.g., OSEDiff) may hallucinate non-existent structures. To overcome these issues, we present RSD, a new distillation method for ResShift. Our method is based on training the student network to produce images such that a new fake ResShift model trained on them will coincide with the teacher model. RSD achieves single-step restoration and outperforms the teacher by a noticeable margin in various perceptual metrics (LPIPS, CLIPIQA, MUSIQ). We show that our distillation method can surpass SinSR, the other distillation-based method for ResShift, making it on par with state-of-the-art diffusion SR distillation methods with limited computational costs in terms of perceptual quality. Compared to SR methods based on pre-trained text-to-image models, RSD produces competitive perceptual quality and requires fewer parameters, GPU memory, and training cost. We provide experimental results on various real-world and synthetic datasets, including RealSR, RealSet65, DRealSR, ImageNet, and DIV2K. We provide the code at https://github.com/Daniil-Selikhanovych/RSD.
翻译:扩散模型在图像超分辨率(SR)中能够生成高质量的视觉结果,但计算成本高昂。尽管已有多种方法试图加速基于扩散的SR模型,但部分方法(如SinSR)无法生成真实感知细节,而另一些方法(如OSEDiff)则可能产生不存在的伪影结构。为解决这些问题,我们提出RSD——一种针对ResShift的新型蒸馏方法。该方法通过训练学生网络生成图像,使得基于这些图像训练的新虚拟ResShift模型与教师模型一致。RSD实现了单步重建,并在多种感知指标(LPIPS、CLIPIQA、MUSIQ)上显著超越教师模型。实验表明,我们的蒸馏方法能够超越另一种基于蒸馏的ResShift方法SinSR,在有限计算成本下,感知质量与最先进的扩散SR蒸馏方法持平。与基于预训练文本到图像模型的SR方法相比,RSD在参数数量、GPU内存和训练成本更低的前提下,获得了具有竞争力的感知质量。我们在多个真实与合成数据集(包括RealSR、RealSet65、DRealSR、ImageNet和DIV2K)上提供了实验结果。代码详见 https://github.com/Daniil-Selikhanovych/RSD。