Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.
翻译:基于得分的生成模型(Score-based Generative Models, SGMs)已在各种任务中展现出卓越的合成效果。然而,当前前向扩散过程的设计空间仍未充分探索,且往往依赖于物理启发法或简化假设。利用可扩展贝叶斯后验采样器开发中的见解,我们提出了一种完整的配方来构建SGM中的前向过程,确保其收敛到期望的目标分布。我们的方法表明,现有的几种SGM可被视为该框架的具体体现。基于此方法,我们引入了相空间朗之万扩散模型(Phase Space Langevin Diffusion, PSLD),该模型在由辅助变量富化后的增广空间中(类似于物理相空间)进行基于得分的建模。实验结果表明,与多种竞争方法相比,PSLD在既定图像合成基准上实现了更优的样本质量和速度-质量权衡。值得注意的是,PSLD达到了与最先进SGM相当的质量(在无条件CIFAR-10生成任务上FID为2.10)。最后,我们展示了PSLD在利用预训练得分网络进行条件合成任务中的适用性,为未来进展提供了一种有吸引力的SGM骨干替代方案。代码和模型检查点可从\url{https://github.com/mandt-lab/PSLD}获取。