Diffusion models (DMs) have made significant progress in the fields of image, audio, and video generation. One downside of DMs is their slow iterative process. Recent algorithms for fast sampling are designed from the perspective of differential equations. However, in higher-order algorithms based on Taylor expansion, estimating the derivative of the score function becomes intractable due to the complexity of large-scale, well-trained neural networks. Driven by this motivation, in this work, we introduce the recursive difference (RD) method to calculate the derivative of the score function in the realm of DMs. Based on the RD method and the truncated Taylor expansion of score-integrand, we propose SciRE-Solver with the convergence order guarantee for accelerating sampling of DMs. To further investigate the effectiveness of the RD method, we also propose a variant named SciREI-Solver based on the RD method and exponential integrator. Our proposed sampling algorithms with RD method attain state-of-the-art (SOTA) FIDs in comparison to existing training-free sampling algorithms, across both discrete-time and continuous-time pre-trained DMs, under various number of score function evaluations (NFE). Remarkably, SciRE-Solver using a small NFEs demonstrates promising potential to surpass the FID achieved by some pre-trained models in their original papers using no fewer than $1000$ NFEs. For example, we reach SOTA value of $2.40$ FID with $100$ NFE for continuous-time DM and of $3.15$ FID with $84$ NFE for discrete-time DM on CIFAR-10, as well as of $2.17$ (2.02) FID with $18$ (50) NFE for discrete-time DM on CelebA 64$\times$64.
翻译:扩散模型(DMs)在图像、音频和视频生成领域取得了显著进展。其缺点之一是迭代过程缓慢。近年来快速采样算法常从微分方程角度进行设计。然而,在基于泰勒展开的高阶算法中,由于大规模、预训练神经网络的复杂性,得分函数导数的估计变得难以处理。基于此动机,本文在扩散模型框架下引入递归差分(RD)方法用于计算得分函数导数。基于RD方法与得分-被积函数的截断泰勒展开,我们提出具有收敛阶保证的SciRE-Solver以加速扩散模型采样。为进一步探究RD方法的有效性,我们还基于RD方法和指数积分器提出变体SciREI-Solver。采用RD方法的采样算法在离散时间和连续时间预训练扩散模型上,在不同得分函数评估次数(NFE)下均取得了与现有免训练采样算法相比的最优FID(SOTA)结果。值得注意的是,使用较少NFE的SciRE-Solver展现出超越某些预训练模型原始论文中采用至少1000次NFE所获FID的潜力。例如,在CIFAR-10数据集上,连续时间扩散模型以100次NFE达到2.40 FID的SOTA值,离散时间扩散模型以84次NFE达到3.15 FID;在CelebA 64×64数据集上,离散时间扩散模型以18次(50次)NFE达到2.17(2.02)FID。