Diffusion models have demonstrated impressive generative capabilities, but their 'exposure bias' problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output (Epsilon), mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDPM/DDIM, LDM), unconditional and conditional settings, and deterministic vs. stochastic sampling verify the effectiveness of our method.
翻译:扩散模型展现了卓越的生成能力,但其“暴露偏差”问题——即训练与采样之间的输入不匹配——尚未得到深入探索。本文通过首先对采样分布进行解析建模,系统研究了扩散模型中的暴露偏差问题,在此基础上将每个采样步骤的预测误差归因为暴露偏差的根源。进一步地,我们讨论了该问题的潜在解决方案,并提出了一种直观的评估指标。在阐明暴露偏差的同时,我们提出了一种简单而有效的免训练方法——Epsilon Scaling——以缓解暴露偏差。研究表明,Epsilon Scaling通过缩小网络输出(Epsilon)使采样轨迹明确接近训练阶段学习到的向量场,从而减轻训练与采样之间的输入不匹配。在多种扩散框架(ADM、DDPM/DDIM、LDM)、无条件与条件设定、确定性及随机采样场景下的实验验证了该方法的有效性。