We present DiffBIR, which leverages pretrained text-to-image diffusion models for blind image restoration problem. Our framework adopts a two-stage pipeline. In the first stage, we pretrain a restoration module across diversified degradations to improve generalization capability in real-world scenarios. The second stage leverages the generative ability of latent diffusion models, to achieve realistic image restoration. Specifically, we introduce an injective modulation sub-network -- LAControlNet for finetuning, while the pre-trained Stable Diffusion is to maintain its generative ability. Finally, we introduce a controllable module that allows users to balance quality and fidelity by introducing the latent image guidance in the denoising process during inference. Extensive experiments have demonstrated its superiority over state-of-the-art approaches for both blind image super-resolution and blind face restoration tasks on synthetic and real-world datasets. The code is available at https://github.com/XPixelGroup/DiffBIR.
翻译:本文提出DiffBIR,利用预训练的文本到图像扩散模型解决盲图像恢复问题。我们的框架采用两阶段流水线:第一阶段,我们预训练一个面向多种退化类型的恢复模块,以提升在真实场景中的泛化能力;第二阶段,利用潜在扩散模型的生成能力实现逼真的图像恢复。具体而言,我们引入一种注入式调制子网络——LAControlNet进行微调,同时保持预训练Stable Diffusion的生成能力。最后,我们提出一个可控模块,允许用户在推理过程中通过去噪阶段引入潜在图像引导来平衡质量与保真度。大量实验证明,该方法在合成与真实数据集上的盲图像超分辨率及盲人脸恢复任务中均优于现有最先进方法。代码已开源在https://github.com/XPixelGroup/DiffBIR。