Recent years have witnessed the rapid progress and broad application of diffusion probabilistic models (DPMs). Sampling from DPMs can be viewed as solving an ordinary differential equation (ODE). Despite the promising performance, the generation of DPMs usually consumes much time due to the large number of function evaluations (NFE). Though recent works have accelerated the sampling to around 20 steps with high-order solvers, the sample quality with less than 10 NFE can still be improved. In this paper, we propose a unified sampling framework (USF) to study the optional strategies for solver. Under this framework, we further reveal that taking different solving strategies at different timesteps may help further decrease the truncation error, and a carefully designed \emph{solver schedule} has the potential to improve the sample quality by a large margin. Therefore, we propose a new sampling framework based on the exponential integral formulation that allows free choices of solver strategy at each step and design specific decisions for the framework. Moreover, we propose $S^3$, a predictor-based search method that automatically optimizes the solver schedule to get a better time-quality trade-off of sampling. We demonstrate that $S^3$ can find outstanding solver schedules which outperform the state-of-the-art sampling methods on CIFAR-10, CelebA, ImageNet, and LSUN-Bedroom datasets. Specifically, we achieve 2.69 FID with 10 NFE and 6.86 FID with 5 NFE on CIFAR-10 dataset, outperforming the SOTA method significantly. We further apply $S^3$ to Stable-Diffusion model and get an acceleration ratio of 2$\times$, showing the feasibility of sampling in very few steps without retraining the neural network.
翻译:近年来,扩散概率模型(DPMs)取得了快速发展并得到广泛应用。从DPM中采样可视为求解常微分方程(ODE)的过程。尽管性能优异,但DPM生成过程因函数评估次数(NFE)过大而耗时严重。虽然近期研究通过高阶求解器将采样加速至约20步,但小于10次NFE时的样本质量仍有提升空间。本文提出一种统一采样框架(USF),研究求解器的可选策略。在该框架下,我们进一步揭示:在不同时间步采用不同求解策略有助于降低截断误差,精心设计的"求解器调度"可显著提升样本质量。为此,基于指数积分形式提出新型采样框架,允许在每一步自由选择求解器策略,并为该框架设计特定决策方案。此外,我们提出$S^3$——一种基于预测器的搜索方法,可自动优化求解器调度以实现采样时间与质量的更优平衡。实验证明,$S^3$能在CIFAR-10、CelebA、ImageNet和LSUN-Bedroom数据集上找到超越现有最优采样方法的求解器调度方案。具体而言,在CIFAR-10数据集上,我们以10次NFE实现2.69的FID值,以5次NFE实现6.86的FID值,显著超越当前最优方法。进一步将$S^3$应用于Stable-Diffusion模型,获得2倍的加速比,展示了无需重新训练神经网络即可在极少量采样步数内完成生成的可行性。