Diffusion models (DMs) demonstrate potent image generation capabilities in various generative modeling tasks. Nevertheless, their primary limitation lies in slow sampling speed, requiring hundreds or thousands of sequential function evaluations through large neural networks to generate high-quality images. Sampling from DMs can be seen alternatively as solving corresponding stochastic differential equations (SDEs) or ordinary differential equations (ODEs). In this work, we formulate the sampling process as an extended reverse-time SDE (ER SDE), unifying prior explorations into ODEs and SDEs. Leveraging the semi-linear structure of ER SDE solutions, we offer exact solutions and arbitrarily high-order approximate solutions for VP SDE and VE SDE, respectively. Based on the solution space of the ER SDE, we yield mathematical insights elucidating the superior performance of ODE solvers over SDE solvers in terms of fast sampling. Additionally, we unveil that VP SDE solvers stand on par with their VE SDE counterparts. Finally, we devise fast and training-free samplers, ER-SDE-Solvers, achieving state-of-the-art performance across all stochastic samplers. Experimental results demonstrate achieving 3.45 FID in 20 function evaluations and 2.24 FID in 50 function evaluations on the ImageNet $64\times64$ dataset.
翻译:扩散模型(DMs)在各种生成建模任务中展现出强大的图像生成能力。然而,其主要限制在于采样速度慢,需要通过大型神经网络进行数百或数千次顺序函数评估以生成高质量图像。从DMs采样可被等价地视为求解相应的随机微分方程(SDEs)或常微分方程(ODEs)。在本工作中,我们将采样过程形式化为扩展反向时间SDE(ER SDE),统一了此前对ODEs和SDEs的探索。利用ER SDE解的半线性结构,我们分别为VP SDE和VE SDE提供了精确解和任意高阶近似解。基于ER SDE的解空间,我们从数学角度阐明了ODE求解器在快速采样中相较于SDE求解器的优越性能。此外,我们揭示了VP SDE求解器与VE SDE求解器性能相当。最后,我们设计了快速且无需训练的采样器ER-SDE-Solvers,在所有随机采样器中实现了最先进的性能。实验结果表明,在ImageNet $64\times64$数据集上,20次函数评估达到3.45 FID,50次函数评估达到2.24 FID。