Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis, but their sampling efficiency is still to be desired due to the typically large number of sampling steps. Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps. While this is a significant development, most sampling methods still employ uniform time steps, which is not optimal when using a small number of steps. To address this issue, we propose a general framework for designing an optimization problem that seeks more appropriate time steps for a specific numerical ODE solver for DPMs. This optimization problem aims to minimize the distance between the ground-truth solution to the ODE and an approximate solution corresponding to the numerical solver. It can be efficiently solved using the constrained trust region method, taking less than $15$ seconds. Our extensive experiments on both unconditional and conditional sampling using pixel- and latent-space DPMs demonstrate that, when combined with the state-of-the-art sampling method UniPC, our optimized time steps significantly improve image generation performance in terms of FID scores for datasets such as CIFAR-10 and ImageNet, compared to using uniform time steps.
翻译:扩散概率模型(DPMs)在高分辨率图像合成中展现出卓越性能,但其采样效率仍受限于通常较大的采样步数。针对DPMs的高阶数值常微分方程求解器最新进展,已使得用更少的采样步数生成高质量图像成为可能。尽管这是重要突破,但多数采样方法仍采用均匀时间步长,这在步数较少时并非最优选择。为解决该问题,我们提出一个通用框架用于设计优化问题,旨在为DPMs的特定数值常微分方程求解器寻求更合适的时间步长。该优化问题以最小化常微分方程真实解与数值求解器对应近似解之间的距离为目标,可通过约束置信域方法高效求解,计算耗时低于15秒。我们在基于像素空间和隐空间的DPMs上,对无条件采样和条件采样进行的大量实验表明:当与最先进采样方法UniPC结合时,相比均匀时间步长,我们的优化时间步长在CIFAR-10和ImageNet等数据集的FID评分上显著提升了图像生成性能。