Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality.

翻译：基于得分的扩散模型已成为深度生成建模中最有前景的框架之一，这得益于其在许多生成任务中展现的最先进性能，同时依赖随机微分方程和常微分方程等数学基础。实验表明，基于ODE的样本质量通常低于基于SDE的样本。本文严谨描述了训练基于得分的扩散模型时出现的动力学与近似范围，包括真实SDE动力学、神经网络近似、由此产生的各类近似粒子动力学及其对应的福克-普朗克方程，以及这些福克-普朗克方程的神经网络近似。我们系统分析了基于得分的扩散模型中ODE与SDE动力学之间的差异，并将其与相关福克-普朗克方程相联系。基于福克-普朗克残差，我们推导出ODE与SDE诱导分布之间Wasserstein 2距离的理论上界。通过显式比较，数值实验表明传统基于得分的扩散模型中ODE与SDE诱导分布可能存在显著差异。此外，数值结果证明，将福克-普朗克残差作为额外正则化项进行优化可有效弥合ODE与SDE诱导分布之间的差距。实验表明，该正则化虽能改善ODE生成分布的质量，但可能以降低SDE样本质量作为代价。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

DiffRec: 扩散推荐模型（SIGIR'23）

专知会员服务

48+阅读 · 2023年4月16日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

牛津大学最新《计算代数拓扑》笔记书，107页pdf

专知会员服务

44+阅读 · 2022年2月17日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日