Diffusion models have demonstrated impressive generative capabilities, but their 'exposure bias' problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output (Epsilon), mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDPM/DDIM, LDM), unconditional and conditional settings, and deterministic vs. stochastic sampling verify the effectiveness of our method.
翻译:扩散模型已展现出令人瞩目的生成能力,但其“曝光偏差”问题(即训练与采样之间的输入不匹配)尚缺乏深入探索。本文系统研究了扩散模型中的曝光偏差问题,首先通过解析建模采样分布,进而将每个采样步骤的预测误差归因于曝光偏差问题的根本原因。此外,我们探讨了该问题的潜在解决方案,并提出了一个直观的度量指标。在阐明曝光偏差的同时,我们提出了一种简单而有效的免训练方法——Epsilon Scaling(ε缩放)以缓解曝光偏差。研究表明,Epsilon Scaling通过缩放网络输出(Epsilon),显式地将采样轨迹拉近至训练阶段所学习的向量场,从而缓解训练与采样之间的输入不匹配。在多种扩散框架(ADM、DDPM/DDIM、LDM)、无条件和条件设置以及确定性与随机采样上的实验验证了该方法的有效性。