To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that typically look very different from the maximum likelihood and the Evidence Lower Bound (ELBO) objectives. In this work, we reveal that diffusion model objectives are actually closely related to the ELBO. Specifically, we show that all commonly used diffusion model objectives equate to a weighted integral of ELBOs over different noise levels, where the weighting depends on the specific objective used. Under the condition of monotonic weighting, the connection is even closer: the diffusion objective then equals the ELBO, combined with simple data augmentation, namely Gaussian noise perturbation. We show that this condition holds for a number of state-of-the-art diffusion models. In experiments, we explore new monotonic weightings and demonstrate their effectiveness, achieving state-of-the-art FID scores on the high-resolution ImageNet benchmark.
翻译:为达到最高的感知质量,最新扩散模型的优化目标通常与最大似然估计和证据下界(ELBO)目标差异显著。本研究表明,扩散模型目标实际上与ELBO密切相关。具体而言,我们证明了所有常用扩散模型目标等价于不同噪声水平下ELBO的加权积分,其权重取决于具体使用的目标函数。在单调加权条件下,这种联系更为紧密:扩散目标等于ELBO与简单数据增强(即高斯噪声扰动)的结合。我们证明了该条件适用于多个最新扩散模型。实验中,我们探索了新的单调加权方法并验证其有效性,在高分辨率ImageNet基准测试中取得了最先进的FID分数。