DiffBFR: Bootstrapping Diffusion Model Towards Blind Face Restoration

Blind face restoration (BFR) is important while challenging. Prior works prefer to exploit GAN-based frameworks to tackle this task due to the balance of quality and efficiency. However, these methods suffer from poor stability and adaptability to long-tail distribution, failing to simultaneously retain source identity and restore detail. We propose DiffBFR to introduce Diffusion Probabilistic Model (DPM) for BFR to tackle the above problem, given its superiority over GAN in aspects of avoiding training collapse and generating long-tail distribution. DiffBFR utilizes a two-step design, that first restores identity information from low-quality images and then enhances texture details according to the distribution of real faces. This design is implemented with two key components: 1) Identity Restoration Module (IRM) for preserving the face details in results. Instead of denoising from pure Gaussian random distribution with LQ images as the condition during the reverse process, we propose a novel truncated sampling method which starts from LQ images with part noise added. We theoretically prove that this change shrinks the evidence lower bound of DPM and then restores more original details. With theoretical proof, two cascade conditional DPMs with different input sizes are introduced to strengthen this sampling effect and reduce training difficulty in the high-resolution image generated directly. 2) Texture Enhancement Module (TEM) for polishing the texture of the image. Here an unconditional DPM, a LQ-free model, is introduced to further force the restorations to appear realistic. We theoretically proved that this unconditional DPM trained on pure HQ images contributes to justifying the correct distribution of inference images output from IRM in pixel-level space. Truncated sampling with fractional time step is utilized to polish pixel-level textures while preserving identity information.

翻译：盲人脸复原（Blind Face Restoration, BFR）是一项重要但具有挑战性的任务。以往工作倾向于采用基于GAN的框架来应对这一任务，因其在质量与效率之间取得了平衡。然而，这些方法存在稳定性差、难以适应长尾分布的问题，无法同时保留源身份信息与恢复细节。针对上述问题，本文提出DiffBFR，将扩散概率模型（Diffusion Probabilistic Model, DPM）引入BFR，鉴于其在避免训练崩溃和生成长尾分布方面相较于GAN的优势。DiffBFR采用两步设计：首先从低质量图像中恢复身份信息，然后根据真实人脸分布增强纹理细节。该设计通过两个关键模块实现：1）身份恢复模块（Identity Restoration Module, IRM），用于保留结果中的人脸细节。在逆向过程中，我们并未像传统方法那样以低质量图像为条件从纯高斯随机分布去噪，而是提出一种新颖的截断采样方法，从添加部分噪声的低质量图像开始采样。我们从理论上证明，这一变化缩小了DPM的证据下界，从而恢复了更多原始细节。基于理论证明，我们引入两个输入尺寸不同的级联条件DPM，以增强采样效果并降低直接生成高分辨率图像时的训练难度。2）纹理增强模块（Texture Enhancement Module, TEM），用于优化图像纹理。此处引入一个无需低质量图像的无条件DPM，进一步推动复原结果趋于真实。我们从理论上证明，该基于纯高质量图像训练的无条件DPM有助于在像素级空间校正IRM输出的推理图像的正确分布。同时，采用分数时间步长的截断采样方法在保留身份信息的前提下优化像素级纹理。