Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis, image restoration (IR) has a strong constraint to generate results in accordance with ground-truth. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs. Code is available at \url{https://github.com/Zj-BinXia/DiffIR}.
翻译:扩散模型(DM)通过将图像合成过程建模为去噪网络的顺序应用,已取得最先进(SOTA)性能。然而,与图像合成不同,图像复原(IR)具有强约束条件,要求生成结果与真实值一致。因此,传统DM在大型模型上执行大量迭代以估计整幅图像或特征图的方式,对IR任务而言效率低下。针对此问题,我们提出了一种高效的IR扩散模型(DiffIR),其包含紧凑型IR先验提取网络(CPEN)、动态IR变换器(DIRformer)及去噪网络。具体而言,DiffIR包含两个训练阶段:预训练与DM训练。在预训练阶段,我们将真实图像输入CPEN$_{S1}$,提取紧凑的IR先验表征(IPR)以指导DIRformer。在第二阶段,我们训练DM使其仅利用低质量(LQ)图像直接估计与预训练CPEN$_{S1}$相同的IRP。我们观察到,由于IPR仅为紧凑向量,DiffIR相比传统DM可使用更少迭代次数获得精确估计,并生成更稳定且真实的结果。因迭代次数较少,我们的DiffIR可对CPEN$_{S2}$、DIRformer和去噪网络进行联合优化,从而进一步降低估计误差的影响。我们在多项IR任务上开展了广泛实验,在降低计算成本的同时取得了SOTA性能。代码开源于\url{https://github.com/Zj-BinXia/DiffIR}。