Recent years have witnessed significant progress in developing efficient training and fast sampling approaches for diffusion models. A recent remarkable advancement is the use of stochastic differential equations (SDEs) to describe data perturbation and generative modeling in a unified mathematical framework. In this paper, we reveal several intriguing geometric structures of diffusion models and contribute a simple yet powerful interpretation to their sampling dynamics. Through carefully inspecting a popular variance-exploding SDE and its marginal-preserving ordinary differential equation (ODE) for sampling, we discover that the data distribution and the noise distribution are smoothly connected with an explicit, quasi-linear sampling trajectory, and another implicit denoising trajectory, which even converges faster in terms of visual quality. We also establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm, with which we can characterize the asymptotic behavior of diffusion models and identify the score deviation. These new geometric observations enable us to improve previous sampling algorithms, re-examine latent interpolation, as well as re-explain the working principles of distillation-based fast sampling techniques.
翻译:近年来,扩散模型在高效训练与快速采样方法方面取得了显著进展。近期一项突出进展是利用随机微分方程(SDE)在统一数学框架下描述数据扰动与生成建模。本文揭示了扩散模型若干有趣的几何结构,并为其采样动力学提供了一种简洁而强有力的解释。通过仔细分析一种流行的方差爆炸型SDE及其用于采样的边际保持常微分方程(ODE),我们发现数据分布与噪声分布通过一条显式准线性采样轨迹及另一条隐式去噪轨迹平滑连接,后者甚至在视觉质量上收敛更快。我们还建立了基于最优ODE采样与经典均值漂移(众数搜索)算法之间的理论关联,据此可刻画扩散模型的渐近行为并识别分数偏差。这些新的几何观察使我们能够改进现有采样算法、重新审视潜在插值技术,并重新阐释基于蒸馏的快速采样方法的工作机理。