Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.
翻译:尽管扩散模型已成功应用于多种图像复原任务,但其性能高度依赖于训练数据集的选择。通常,在特定数据集上训练的扩散模型难以恢复包含分布外退化的图像。为解决这一问题,本文利用强大的视觉语言模型与合成退化流水线,实现野外环境下的图像复原。具体而言,所有低质量图像均通过包含模糊、缩放、噪声及JPEG压缩等常见退化的合成退化流水线生成。随后,我们引入对退化感知的CLIP模型的鲁棒训练方法,提取富含语义的图像内容特征以辅助高质量图像复原。基础扩散模型采用图像复原随机微分方程(IR-SDE)。在此基础上,我们进一步提出一种后验采样策略,用于快速生成无噪声图像。我们在合成数据集与真实退化数据集上评估模型性能。此外,统一图像复原任务的实验表明,所提出的后验采样方法能够提升针对多种退化的图像生成质量。