Diffusion models (DMs) have become the dominant paradigm of generative modeling in a variety of domains by learning stochastic processes from noise to data. Recently, diffusion denoising bridge models (DDBMs), a new formulation of generative modeling that builds stochastic processes between fixed data endpoints based on a reference diffusion process, have achieved empirical success across tasks with coupled data distribution, such as image-to-image translation. However, DDBM's sampling process typically requires hundreds of network evaluations to achieve decent performance, which may impede their practical deployment due to high computational demands. In this work, inspired by the recent advance of consistency models in DMs, we tackle this problem by learning the consistency function of the probability-flow ordinary differential equation (PF-ODE) of DDBMs, which directly predicts the solution at a starting step given any point on the ODE trajectory. Based on a dedicated general-form ODE solver, we propose two paradigms: consistency bridge distillation and consistency bridge training, which is flexible to apply on DDBMs with broad design choices. Experimental results show that our proposed method could sample $4\times$ to $50\times$ faster than the base DDBM and produce better visual quality given the same step in various tasks with pixel resolution ranging from $64 \times 64$ to $256 \times 256$, as well as supporting downstream tasks such as semantic interpolation in the data space.
翻译:扩散模型通过学习从噪声到数据的随机过程,已成为多个领域生成建模的主导范式。近期,扩散去噪桥模型作为一种新的生成建模框架,基于参考扩散过程构建固定数据端点间的随机过程,在图像到图像翻译等具有耦合数据分布的任务中取得了实证成功。然而,DDBM的采样过程通常需要数百次网络评估才能达到良好性能,其高计算需求可能阻碍实际部署。本工作中,受扩散模型中一致性模型最新进展的启发,我们通过学习DDBM概率流常微分方程的一致性函数来解决该问题,该函数可直接基于ODE轨迹上的任意点预测起始步的解。基于专门设计的通用形式ODE求解器,我们提出了两种范式:一致性桥蒸馏与一致性桥训练,可灵活应用于具有广泛设计选择的DDBM。实验结果表明,在像素分辨率从$64 \times 64$到$256 \times 256$的各类任务中,我们提出的方法能以比基础DDBM快$4\times$至$50\times$的速度采样,并在相同步数下产生更优的视觉质量,同时支持数据空间语义插值等下游任务。