Diffusion models (DMs) are capable of generating remarkably high-quality samples by iteratively denoising a random vector, a process that corresponds to moving along the probability flow ordinary differential equation (PF ODE). Interestingly, DMs can also invert an input image to noise by moving backward along the PF ODE, a key operation for downstream tasks such as interpolation and image editing. However, the iterative nature of this process restricts its speed, hindering its broader application. Recently, Consistency Models (CMs) have emerged to address this challenge by approximating the integral of the PF ODE, largely reducing the number of iterations. Yet, the absence of an explicit ODE solver complicates the inversion process. To resolve this, we introduce Bidirectional Consistency Model (BCM), which learns a single neural network that enables both forward and backward traversal along the PF ODE, efficiently unifying generation and inversion tasks within one framework. We can train BCM from scratch or tune it using a pretrained consistency model, wh ich reduces the training cost and increases scalability. We demonstrate that BCM enables one-step generation and inversion while also allowing the use of additional steps to enhance generation quality or reduce reconstruction error. We further showcase BCM's capability in downstream tasks, such as interpolation, inpainting, and blind restoration of compressed images. Notably, when the number of function evaluations (NFE) is constrained, BCM surpasses domain-specific restoration methods, such as I$^2$SB and Palette, in a fully zero-shot manner, offering an efficient alternative for inversion problems. Our code and weights are available at https://github.com/Mosasaur5526/BCM-iCT-torch.
翻译:扩散模型能够通过迭代去噪随机向量生成质量极高的样本,这一过程对应于沿概率流常微分方程的轨迹移动。有趣的是,扩散模型也可以通过沿概率流常微分方程反向移动将输入图像反演为噪声,这是插值和图像编辑等下游任务的关键操作。然而,该过程的迭代特性限制了其速度,阻碍了更广泛的应用。最近,一致性模型通过近似概率流常微分方程的积分来应对这一挑战,大幅减少了迭代次数。然而,由于缺乏显式的常微分方程求解器,反演过程变得复杂。为解决此问题,我们提出了双向一致性模型,该模型学习一个单一的神经网络,能够沿概率流常微分方程进行前向和后向遍历,从而在一个框架内高效统一生成和反演任务。我们可以从头开始训练双向一致性模型,或使用预训练的一致性模型对其进行微调,这降低了训练成本并提高了可扩展性。我们证明双向一致性模型能够实现一步生成和反演,同时允许使用额外步骤来提升生成质量或减少重建误差。我们进一步展示了双向一致性模型在下游任务中的能力,例如插值、修复和压缩图像的盲复原。值得注意的是,当函数评估次数受限时,双向一致性模型以完全零样本的方式超越了特定领域的复原方法(如I$^2$SB和Palette),为反演问题提供了高效的替代方案。我们的代码和权重可在 https://github.com/Mosasaur5526/BCM-iCT-torch 获取。