Diffusion models are powerful generative models that map noise to data using stochastic processes. However, for many applications such as image editing, the model input comes from a distribution that is not random noise. As such, diffusion models must rely on cumbersome methods like guidance or projected sampling to incorporate this information in the generative process. In our work, we propose Denoising Diffusion Bridge Models (DDBMs), a natural alternative to this paradigm based on diffusion bridges, a family of processes that interpolate between two paired distributions given as endpoints. Our method learns the score of the diffusion bridge from data and maps from one endpoint distribution to the other by solving a (stochastic) differential equation based on the learned score. Our method naturally unifies several classes of generative models, such as score-based diffusion models and OT-Flow-Matching, allowing us to adapt existing design and architectural choices to our more general problem. Empirically, we apply DDBMs to challenging image datasets in both pixel and latent space. On standard image translation problems, DDBMs achieve significant improvement over baseline methods, and, when we reduce the problem to image generation by setting the source distribution to random noise, DDBMs achieve comparable FID scores to state-of-the-art methods despite being built for a more general task.
翻译:扩散模型是强大的生成模型,通过随机过程将噪声映射为数据。然而,对于图像编辑等众多应用,模型输入并非随机噪声,而是来自特定分布。因此,扩散模型必须依赖引导或投影采样等复杂方法,在生成过程中融入这些信息。我们提出了一种基于扩散桥的自然替代范式——去噪扩散桥模型(DDBMs)。扩散桥是一类在两个给定端点分布之间进行插值的过程。该方法从数据中学习扩散桥的得分,并通过基于所学得分求解(随机)微分方程,实现从一个端点分布到另一个端点分布的映射。该方法自然地统一了基于得分的扩散模型和最优传输流匹配(OT-Flow-Matching)等生成模型类别,使我们能够将现有设计和架构选择适配到更通用的任务。实验方面,我们在像素空间和隐空间中对具有挑战性的图像数据集应用了DDBMs。在标准图像翻译问题中,DDBMs相比基线方法取得了显著改进;而当通过将源分布设为随机噪声将问题简化为图像生成时,DDBMs在更通用任务设定下仍取得了与最先进方法相当的FID分数。