To achieve the highest perceptual quality, state-of-the-art diffusion models are optimized with objectives that look very different from the maximum likelihood and the Evidence Lower Bound (ELBO) objectives. In this work, we reveal that diffusion model objectives are actually closely related to the ELBO. Specifically, we show that all commonly used diffusion model objectives equate to a weighted integral of ELBOs over different noise levels, where the weighting depends on the specific objective used. Under the condition of monotonic weighting, the connection is even closer: the diffusion objective then equals the ELBO, combined with simple data augmentation, namely Gaussian noise perturbation. We show that this condition holds for a number of state-of-the-art diffusion models. In experiments, we explore new monotonic weightings and demonstrate their effectiveness, achieving state-of-the-art FID scores on the high-resolution ImageNet benchmark.
翻译:为实现最优感知质量,最先进的扩散模型所优化的目标函数与最大似然估计和证据下界(ELBO)目标函数存在显著差异。本研究揭示了扩散模型目标函数与ELBO之间实际存在密切关联。具体而言,我们证明所有常用扩散模型目标函数可等价为不同噪声水平下ELBO的加权积分,其中权重由具体目标函数决定。在单调权重条件下,这种关联更为紧密:扩散目标函数等价于结合简单数据增强(即高斯噪声扰动)的ELBO。我们证明该条件适用于多项前沿扩散模型。实验中,我们探索了新型单调权重方案并验证其有效性,在高分辨率ImageNet基准测试中实现了最先进的FID分数。