Diffusion-based models are widely recognized for their effectiveness in image restoration tasks; however, their iterative denoising process, which begins from Gaussian noise, often results in slow inference speeds. The Image-to-Image Schr\"odinger Bridge (I$^2$SB) presents a promising alternative by starting the generative process from corrupted images and leveraging training techniques from score-based diffusion models. In this paper, we introduce the Implicit Image-to-Image Schr\"odinger Bridge (I$^3$SB) to further accelerate the generative process of I$^2$SB. I$^3$SB reconfigures the generative process into a non-Markovian framework by incorporating the initial corrupted image into each step, while ensuring that the marginal distribution aligns with that of I$^2$SB. This allows for the direct use of the pretrained network from I$^2$SB. Extensive experiments on natural images, human face images, and medical images validate the acceleration benefits of I$^3$SB. Compared to I$^2$SB, I$^3$SB achieves the same perceptual quality with fewer generative steps, while maintaining equal or improved fidelity to the ground truth.
翻译:基于扩散的模型在图像恢复任务中被广泛认可其有效性;然而,其从高斯噪声开始的迭代去噪过程通常导致推理速度缓慢。图像到图像薛定谔桥通过从受损图像开始生成过程并利用基于分数的扩散模型的训练技术,提供了一种有前景的替代方案。在本文中,我们引入了隐式图像到图像薛定谔桥,以进一步加速I$^2$SB的生成过程。I$^3$SB通过将初始受损图像纳入每个步骤,将生成过程重新配置为非马尔可夫框架,同时确保边缘分布与I$^2$SB保持一致。这使得可以直接使用I$^2$SB的预训练网络。在自然图像、人脸图像和医学图像上进行的大量实验验证了I$^3$SB的加速优势。与I$^2$SB相比,I$^3$SB在更少的生成步骤下实现了相同的感知质量,同时保持了对真实情况同等或更高的保真度。