To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that look very different from the maximum likelihood and the Evidence Lower Bound (ELBO) objectives. In this work, we reveal that diffusion model objectives are actually closely related to the ELBO. Specifically, we show that all commonly used diffusion model objectives equate to a weighted integral of ELBOs over different noise levels, where the weighting depends on the specific objective used. Under the condition of monotonic weighting, the connection is even closer: the diffusion objective then equals the ELBO, combined with simple data augmentation, namely Gaussian noise perturbation. We show that this condition holds for a number of state-of-the-art diffusion models. In experiments, we explore new monotonic weightings and demonstrate their effectiveness, achieving state-of-the-art FID scores on the high-resolution ImageNet benchmark.
翻译:为了实现最高的感知质量,最先进的扩散模型采用与最大似然估计和证据下界(ELBO)目标函数截然不同的优化目标进行训练。本研究揭示了扩散模型目标函数实际上与ELBO密切相关。具体而言,我们证明所有常用扩散模型目标函数等价于不同噪声水平下ELBO的加权积分,其权重取决于所使用的特定目标函数。在单调权重条件下,这种联系更为紧密:扩散目标函数等价于ELBO结合简单数据增强(即高斯噪声扰动)。我们证明该条件适用于多个最先进的扩散模型。通过实验探索新的单调权重方案,我们验证了其有效性,在高分辨率ImageNet基准测试上取得了最先进的FID分数。