A Complete Recipe for Diffusion Generative Models

Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.

翻译：基于得分的生成模型（Score-based Generative Models, SGMs）已在各种任务中展现出卓越的合成效果。然而，当前前向扩散过程的设计空间仍未充分探索，且往往依赖于物理启发法或简化假设。利用可扩展贝叶斯后验采样器开发中的见解，我们提出了一种完整的配方来构建SGM中的前向过程，确保其收敛到期望的目标分布。我们的方法表明，现有的几种SGM可被视为该框架的具体体现。基于此方法，我们引入了相空间朗之万扩散模型（Phase Space Langevin Diffusion, PSLD），该模型在由辅助变量富化后的增广空间中（类似于物理相空间）进行基于得分的建模。实验结果表明，与多种竞争方法相比，PSLD在既定图像合成基准上实现了更优的样本质量和速度-质量权衡。值得注意的是，PSLD达到了与最先进SGM相当的质量（在无条件CIFAR-10生成任务上FID为2.10）。最后，我们展示了PSLD在利用预训练得分网络进行条件合成任务中的适用性，为未来进展提供了一种有吸引力的SGM骨干替代方案。代码和模型检查点可从\url{https://github.com/mandt-lab/PSLD}获取。

相关内容

MoDELS

关注 0

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日