Score-based Generative Models (SGMs) have achieved state-of-the-art synthesis results on diverse tasks. However, the current design space of the forward diffusion process is largely unexplored and often relies on physical intuition or simplifying assumptions. Leveraging results from the design of scalable Bayesian posterior samplers, we present a complete recipe for constructing forward processes in SGMs, all of which are guaranteed to converge to the target distribution of interest. We show that several existing SGMs can be cast as specific instantiations of this parameterization. Furthermore, building on this recipe, we construct a novel SGM: Phase Space Langevin Diffusion (PSLD), which performs score-based modeling in a space augmented with auxiliary variables akin to a physical phase space. We show that PSLD outperforms competing baselines in terms of sample quality and the speed-vs-quality tradeoff across different samplers on various standard image synthesis benchmarks. Moreover, we show that PSLD achieves sample quality comparable to state-of-the-art SGMs (FID: 2.10 on unconditional CIFAR-10 generation), providing an attractive alternative as an SGM backbone for further development. We will publish our code and model checkpoints for reproducibility at https://github.com/mandt-lab/PSLD.
翻译:基于分数的生成模型(Score-based Generative Models, SGMs)已在多种任务中取得了最先进的合成结果。然而,前向扩散过程的当前设计空间在很大程度上尚未被探索,且常常依赖于物理直觉或简化假设。利用可扩展贝叶斯后验采样器的设计结果,我们提出了一份用于构建SGMs中前向过程的完整配方,所有这些过程都保证收敛到目标分布。我们证明,多个现有SGM可被视为此参数化的特定实例。此外,基于该配方,我们构建了一种新型SGM:相空间朗之万扩散(Phase Space Langevin Diffusion, PSLD),该方法在由辅助变量增强的空间(类似于物理相空间)中执行基于分数的建模。我们表明,在各种标准图像合成基准上,PSLD在样本质量以及不同采样器间的速度与质量权衡方面均优于竞争基线。而且,PSLD达到了与最先进SGM相当的样本质量(在无条件CIFAR-10生成任务中FID为2.10),为后续发展提供了一种有吸引力的SGM骨干替代方案。我们将公开代码和模型检查点以确保可复现性,地址为https://github.com/mandt-lab/PSLD。