Diffusion models have demonstrated exceptional capabilities in generating high-fidelity images but typically suffer from inefficient sampling. Many solver designs and noise scheduling strategies have been proposed to dramatically improve sampling speeds. In this paper, we introduce a new sampling method that is up to $186\%$ faster than the current state of the art solver for comparative FID on ImageNet512. This new sampling method is training-free and uses an ordinary differential equation (ODE) solver. The key to our method resides in using higher-dimensional initial noise, allowing to produce more detailed samples with less function evaluations from existing pretrained diffusion models. In addition, by design our solver allows to control the level of detail through a simple hyper-parameter at no extra computational cost. We present how our approach leverages momentum dynamics by establishing a fundamental equivalence between momentum diffusion models and conventional diffusion models with respect to their training paradigms. Moreover, we observe the use of higher-dimensional noise naturally exhibits characteristics similar to stochastic differential equations (SDEs). Finally, we demonstrate strong performances on a set of representative pretrained diffusion models, including EDM, EDM2, and Stable-Diffusion 3, which cover models in both pixel and latent spaces, as well as class and text conditional settings. The code is available at https://github.com/apple/ml-tada.
翻译:扩散模型在生成高保真图像方面展现出卓越能力,但通常存在采样效率低下的问题。已有许多求解器设计和噪声调度策略被提出以显著提升采样速度。本文提出一种新的采样方法,在ImageNet512数据集上达到可比FID指标时,其速度比当前最优求解器快达$186\%$。该新采样方法无需额外训练,采用常微分方程(ODE)求解器。我们方法的核心在于使用更高维的初始噪声,从而能够通过更少函数评估从现有预训练扩散模型中生成更精细的样本。此外,我们的求解器设计允许通过简单超参数控制细节水平,且无需额外计算成本。我们通过建立动量扩散模型与传统扩散模型在训练范式上的基本等价性,阐释了本方法如何利用动量动力学机制。进一步地,我们观察到使用高维噪声自然呈现出与随机微分方程(SDE)相似的特征。最后,我们在包括EDM、EDM2和Stable-Diffusion 3在内的一系列代表性预训练扩散模型上展示了优异性能,这些模型涵盖了像素空间与潜在空间模型,以及类别条件与文本条件设定。代码发布于https://github.com/apple/ml-tada。