Adapting the Diffusion Probabilistic Model (DPM) for direct image super-resolution is wasteful, given that a simple Convolutional Neural Network (CNN) can recover the main low-frequency content. Therefore, we present ResDiff, a novel Diffusion Probabilistic Model based on Residual structure for Single Image Super-Resolution (SISR). ResDiff utilizes a combination of a CNN, which restores primary low-frequency components, and a DPM, which predicts the residual between the ground-truth image and the CNN-predicted image. In contrast to the common diffusion-based methods that directly use LR images to guide the noise towards HR space, ResDiff utilizes the CNN's initial prediction to direct the noise towards the residual space between HR space and CNN-predicted space, which not only accelerates the generation process but also acquires superior sample quality. Additionally, a frequency-domain-based loss function for CNN is introduced to facilitate its restoration, and a frequency-domain guided diffusion is designed for DPM on behalf of predicting high-frequency details. The extensive experiments on multiple benchmark datasets demonstrate that ResDiff outperforms previous diffusion-based methods in terms of shorter model convergence time, superior generation quality, and more diverse samples.
翻译:将扩散概率模型(DPM)直接用于图像超分辨率是低效的,因为简单的卷积神经网络(CNN)即可恢复主要的低频成分。为此,我们提出ResDiff——一种基于残差结构的新颖扩散概率模型,专门用于单图像超分辨率(SISR)。ResDiff结合了CNN(用于恢复主要低频成分)与DPM(用于预测真实图像与CNN预测图像之间的残差)。与常见的基于扩散的方法直接利用低分辨率图像引导噪声向高分辨率空间演化不同,ResDiff利用CNN的初始预测将噪声引导至高分辨率空间与CNN预测空间之间的残差空间,这不仅加速了生成过程,还获得了更优的样本质量。此外,我们为CNN引入了基于频域的损失函数以促进其恢复效果,并为DPM设计了频域引导扩散机制以预测高频细节。在多个基准数据集上的大量实验表明,ResDiff在缩短模型收敛时间、提升生成质量及增加样本多样性方面均优于以往的扩散方法。