Bidirectional Consistency Models

from arxiv, 39 pages, 27 figures; a shorter version of this paper also appeared in the ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling

Diffusion models (DMs) are capable of generating remarkably high-quality samples by iteratively denoising a random vector, a process that corresponds to moving along the probability flow ordinary differential equation (PF ODE). Interestingly, DMs can also invert an input image to noise by moving backward along the PF ODE, a key operation for downstream tasks such as interpolation and image editing. However, the iterative nature of this process restricts its speed, hindering its broader application. Recently, Consistency Models (CMs) have emerged to address this challenge by approximating the integral of the PF ODE, largely reducing the number of iterations. Yet, the absence of an explicit ODE solver complicates the inversion process. To resolve this, we introduce Bidirectional Consistency Model (BCM), which learns a single neural network that enables both forward and backward traversal along the PF ODE, efficiently unifying generation and inversion tasks within one framework. We can train BCM from scratch or tune it using a pretrained consistency model, wh ich reduces the training cost and increases scalability. We demonstrate that BCM enables one-step generation and inversion while also allowing the use of additional steps to enhance generation quality or reduce reconstruction error. We further showcase BCM's capability in downstream tasks, such as interpolation, inpainting, and blind restoration of compressed images. Notably, when the number of function evaluations (NFE) is constrained, BCM surpasses domain-specific restoration methods, such as I$^2$SB and Palette, in a fully zero-shot manner, offering an efficient alternative for inversion problems. Our code and weights are available at https://github.com/Mosasaur5526/BCM-iCT-torch.

翻译：扩散模型能够通过迭代去噪随机向量生成质量极高的样本，这一过程对应于沿概率流常微分方程的轨迹移动。有趣的是，扩散模型也可以通过沿概率流常微分方程反向移动将输入图像反演为噪声，这是插值和图像编辑等下游任务的关键操作。然而，该过程的迭代特性限制了其速度，阻碍了更广泛的应用。最近，一致性模型通过近似概率流常微分方程的积分来应对这一挑战，大幅减少了迭代次数。然而，由于缺乏显式的常微分方程求解器，反演过程变得复杂。为解决此问题，我们提出了双向一致性模型，该模型学习一个单一的神经网络，能够沿概率流常微分方程进行前向和后向遍历，从而在一个框架内高效统一生成和反演任务。我们可以从头开始训练双向一致性模型，或使用预训练的一致性模型对其进行微调，这降低了训练成本并提高了可扩展性。我们证明双向一致性模型能够实现一步生成和反演，同时允许使用额外步骤来提升生成质量或减少重建误差。我们进一步展示了双向一致性模型在下游任务中的能力，例如插值、修复和压缩图像的盲复原。值得注意的是，当函数评估次数受限时，双向一致性模型以完全零样本的方式超越了特定领域的复原方法（如I$^2$SB和Palette），为反演问题提供了高效的替代方案。我们的代码和权重可在 https://github.com/Mosasaur5526/BCM-iCT-torch 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日