Diffusion models have demonstrated compelling generation quality by optimizing the variational lower bound through a simple denoising score matching loss. In this paper, we provide theoretical evidence that the prevailing practice of using a constant loss weight strategy in diffusion models leads to biased estimation during the training phase. Simply optimizing the denoising network to predict Gaussian noise with constant weighting may hinder precise estimations of original images. To address the issue, we propose an elegant and effective weighting strategy grounded in the theoretically unbiased principle. Moreover, we conduct a comprehensive and systematic exploration to dissect the inherent bias problem deriving from constant weighting loss from the perspectives of its existence, impact and reasons. These analyses are expected to advance our understanding and demystify the inner workings of diffusion models. Through empirical evaluation, we demonstrate that our proposed debiased estimation method significantly enhances sample quality without the reliance on complex techniques, and exhibits improved efficiency compared to the baseline method both in training and sampling processes.
翻译:扩散模型通过简单的去噪得分匹配损失优化变分下界,展示了令人信服的生成质量。本文通过理论证明,当前扩散模型中普遍采用的恒定损失权重策略会导致训练阶段的估计偏差。仅通过恒定权重优化去噪网络来预测高斯噪声,可能会阻碍对原始图像的精确估计。为解决该问题,我们提出了一种基于理论无偏原则的优雅而有效的权重策略。此外,我们从偏差的存在性、影响及成因角度,对恒定权重损失引发的固有偏差问题开展了全面系统的探究。这些分析有望增进我们对扩散模型内部机制的理解并揭示其工作原理。通过实验评估,我们证明所提出的去偏估计方法在不依赖复杂技术的情况下显著提升了样本质量,并在训练与采样过程中相较基线方法展现出更优的效率。