The pre-trained text-to-image diffusion models have been increasingly employed to tackle the real-world image super-resolution (Real-ISR) problem due to their powerful generative image priors. Most of the existing methods start from random noise to reconstruct the high-quality (HQ) image under the guidance of the given low-quality (LQ) image. While promising results have been achieved, such Real- ISR methods require multiple diffusion steps to reproduce the HQ image, increasing the computational cost. Meanwhile, the random noise introduces uncertainty in the output, which is unfriendly to image restoration tasks. To address these issues, we propose a one-step effective diffusion network, namely OSEDiff, for the Real- ISR problem. We argue that the LQ image contains rich information to restore its HQ counterpart, and hence the given LQ image can be directly taken as the starting point for diffusion, eliminating the uncertainty introduced by random noise sampling. We finetune the pre-trained diffusion network with trainable layers to adapt it to complex image degradations. To ensure that the one-step diffusion model could yield HQ Real-ISR output, we apply variational score distillation in the latent space to conduct KL-divergence regularization. As a result, our OSEDiff model can efficiently and effectively generate HQ images in just one diffusion step. Our experiments demonstrate that OSEDiff achieves comparable or even better Real-ISR results, in terms of both objective metrics and subjective evaluations, than previous diffusion model based Real-ISR methods that require dozens or hundreds of steps. The source codes will be released at https://github.com/cswry/OSEDiff.
翻译:预训练的文本到图像扩散模型因其强大的生成图像先验,已越来越多地被用于解决真实世界图像超分辨率(Real-ISR)问题。现有方法大多从随机噪声出发,在给定低质量(LQ)图像的引导下重建高质量(HQ)图像。尽管已取得有希望的结果,此类Real-ISR方法需要多个扩散步骤来复现HQ图像,增加了计算成本。同时,随机噪声引入了输出的不确定性,这对图像复原任务并不友好。为解决这些问题,我们针对Real-ISR问题提出了一种一步式高效扩散网络,即OSEDiff。我们认为LQ图像包含恢复其HQ对应图像的丰富信息,因此可以直接将给定的LQ图像作为扩散的起点,从而消除随机噪声采样引入的不确定性。我们通过可训练层对预训练的扩散网络进行微调,使其适应复杂的图像退化。为确保一步扩散模型能够产生高质量的Real-ISR输出,我们在潜在空间中应用变分分数蒸馏以进行KL散度正则化。因此,我们的OSEDiff模型仅需一个扩散步骤即可高效且有效地生成高质量图像。实验表明,无论是在客观指标还是主观评价方面,OSEDiff都取得了与先前需要数十或数百步的基于扩散模型的Real-ISR方法相当甚至更好的结果。源代码将在 https://github.com/cswry/OSEDiff 发布。