Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality.
翻译:基于得分的扩散模型已成为深度生成建模中最有前景的框架之一,这得益于其在许多生成任务中展现的最先进性能,同时依赖随机微分方程和常微分方程等数学基础。实验表明,基于ODE的样本质量通常低于基于SDE的样本。本文严谨描述了训练基于得分的扩散模型时出现的动力学与近似范围,包括真实SDE动力学、神经网络近似、由此产生的各类近似粒子动力学及其对应的福克-普朗克方程,以及这些福克-普朗克方程的神经网络近似。我们系统分析了基于得分的扩散模型中ODE与SDE动力学之间的差异,并将其与相关福克-普朗克方程相联系。基于福克-普朗克残差,我们推导出ODE与SDE诱导分布之间Wasserstein 2距离的理论上界。通过显式比较,数值实验表明传统基于得分的扩散模型中ODE与SDE诱导分布可能存在显著差异。此外,数值结果证明,将福克-普朗克残差作为额外正则化项进行优化可有效弥合ODE与SDE诱导分布之间的差距。实验表明,该正则化虽能改善ODE生成分布的质量,但可能以降低SDE样本质量作为代价。