Denoising is intuitively related to projection. Indeed, under the manifold hypothesis, adding random noise is approximately equivalent to orthogonal perturbation. Hence, learning to denoise is approximately learning to project. In this paper, we use this observation to reinterpret denoising diffusion models as approximate gradient descent applied to the Euclidean distance function. We then provide straight-forward convergence analysis of the DDIM sampler under simple assumptions on the projection-error of the denoiser. Finally, we propose a new sampler based on two simple modifications to DDIM using insights from our theoretical results. In as few as 5-10 function evaluations, our sampler achieves state-of-the-art FID scores on pretrained CIFAR-10 and CelebA models and can generate high quality samples on latent diffusion models.
翻译:降噪直观上近似于投影。事实上,在流形假设下,添加随机噪声约等价于正交扰动。因此,学习降噪约等价于学习投影。本文利用这一观察将去噪扩散模型重新解释为对欧几里得距离函数执行的近似梯度下降。随后,我们在去噪器投影误差的简单假设下,对DDIM采样器进行了直接的收敛性分析。最终,基于理论洞见对DDIM提出两项简单改进,从而设计出新型采样器。仅需5至10次函数评估,该采样器便在预训练的CIFAR-10和CelebA模型上取得了当前最优的FID分数,并能生成潜在扩散模型的高质量样本。