Diffusion models are powerful generative models that map noise to data using stochastic processes. However, for many applications such as image editing, the model input comes from a distribution that is not random noise. As such, diffusion models must rely on cumbersome methods like guidance or projected sampling to incorporate this information in the generative process. In our work, we propose Denoising Diffusion Bridge Models (DDBMs), a natural alternative to this paradigm based on diffusion bridges, a family of processes that interpolate between two paired distributions given as endpoints. Our method learns the score of the diffusion bridge from data and maps from one endpoint distribution to the other by solving a (stochastic) differential equation based on the learned score. Our method naturally unifies several classes of generative models, such as score-based diffusion models and OT-Flow-Matching, allowing us to adapt existing design and architectural choices to our more general problem. Empirically, we apply DDBMs to challenging image datasets in both pixel and latent space. On standard image translation problems, DDBMs achieve significant improvement over baseline methods, and, when we reduce the problem to image generation by setting the source distribution to random noise, DDBMs achieve comparable FID scores to state-of-the-art methods despite being built for a more general task.
翻译:扩散模型是利用随机过程将噪声映射为数据的强大生成模型。然而,在许多应用(如图像编辑)中,模型输入并非来自随机噪声分布。因此,扩散模型必须依赖引导或投影采样等繁琐方法,将输入信息融入生成过程。本文提出去噪扩散桥模型(DDBMs),一种基于扩散桥(即两端点分布间插值的一组随机过程)的自然替代范式。该方法从数据中学习扩散桥的得分,并通过基于该得分求解(随机)微分方程,实现从一个端点分布到另一个端点分布的映射。该方法自然地统一了多类生成模型(如基于得分的扩散模型和OT-Flow-Matching),使我们能够将现有设计与架构选择适配至更通用的问题。实验上,我们将DDBMs应用于像素空间和潜空间中的高难度图像数据集。在标准图像翻译任务中,DDBMs相较于基线方法取得显著提升;当通过将源分布设为随机噪声将问题简化为图像生成时,DDBMs在更通用任务的设计下仍能达到与先进方法相当的FID分数。