Adapting the Diffusion Probabilistic Model (DPM) for direct image super-resolution is wasteful, given that a simple Convolutional Neural Network (CNN) can recover the main low-frequency content. Therefore, we present ResDiff, a novel Diffusion Probabilistic Model based on Residual structure for Single Image Super-Resolution (SISR). ResDiff utilizes a combination of a CNN, which restores primary low-frequency components, and a DPM, which predicts the residual between the ground-truth image and the CNN predicted image. In contrast to the common diffusion-based methods that directly use LR images to guide the noise towards HR space, ResDiff utilizes the CNN's initial prediction to direct the noise towards the residual space between HR space and CNN-predicted space, which not only accelerates the generation process but also acquires superior sample quality. Additionally, a frequency-domain-based loss function for CNN is introduced to facilitate its restoration, and a frequency-domain guided diffusion is designed for DPM on behalf of predicting high-frequency details. The extensive experiments on multiple benchmark datasets demonstrate that ResDiff outperforms previous diffusion based methods in terms of shorter model convergence time, superior generation quality, and more diverse samples.
翻译:将扩散概率模型(DPM)直接应用于图像超分辨率是一种浪费,因为简单的卷积神经网络(CNN)即可恢复主要的低频成分。为此,我们提出基于残差结构的扩散概率模型ResDiff,用于单图像超分辨率任务。该模型结合CNN与DPM:CNN负责恢复图像的主要低频成分,而DPM则预测真实图像与CNN预测图像之间的残差。与现有基于扩散的方法直接使用低分辨率图像引导噪声向高分辨率空间逼近不同,ResDiff利用CNN的初始预测结果,将噪声引导至高分辨率空间与CNN预测空间之间的残差空间中。这一设计不仅加速了生成过程,而且获得了更优的样本质量。此外,我们为CNN引入了基于频域的损失函数以增强其恢复能力,并为DPM设计了频域引导扩散机制以预测高频细节。在多个基准数据集上的大量实验表明,ResDiff在模型收敛时间、生成质量和样本多样性方面均优于现有基于扩散的方法。