To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that typically look very different from the maximum likelihood and the Evidence Lower Bound (ELBO) objectives. In this work, we reveal that diffusion model objectives are actually closely related to the ELBO. Specifically, we show that all commonly used diffusion model objectives equate to a weighted integral of ELBOs over different noise levels, where the weighting depends on the specific objective used. Under the condition of monotonic weighting, the connection is even closer: the diffusion objective then equals the ELBO, combined with simple data augmentation, namely Gaussian noise perturbation. We show that this condition holds for a number of state-of-the-art diffusion models. In experiments, we explore new monotonic weightings and demonstrate their effectiveness, achieving state-of-the-art FID scores on the high-resolution ImageNet benchmark.
翻译:为达到最优感知质量,当前最先进的扩散模型通常采用与最大似然估计及证据下界(ELBO)目标差异显著的优化目标进行训练。本研究揭示了扩散模型目标函数与ELBO之间的深层关联:具体而言,我们证明了所有常用扩散模型目标函数等价于不同噪声水平下ELBO的加权积分,其权重取决于具体采用的优化目标。在单调加权条件下,两者联系更为紧密——扩散目标函数等价于ELBO与简单数据增强(即高斯噪声扰动)的组合。实验表明,该条件适用于多项当前最优扩散模型。我们进一步探索了新型单调权重方案,并在高分辨率ImageNet基准测试中取得了最优FID分数。