Diffusion models have demonstrated impressive generative capabilities, but their 'exposure bias' problem, described as the input mismatch between training and sampling, lacks in-depth exploration. In this paper, we systematically investigate the exposure bias problem in diffusion models by first analytically modelling the sampling distribution, based on which we then attribute the prediction error at each sampling step as the root cause of the exposure bias issue. Furthermore, we discuss potential solutions to this issue and propose an intuitive metric for it. Along with the elucidation of exposure bias, we propose a simple, yet effective, training-free method called Epsilon Scaling to alleviate the exposure bias. We show that Epsilon Scaling explicitly moves the sampling trajectory closer to the vector field learned in the training phase by scaling down the network output (Epsilon), mitigating the input mismatch between training and sampling. Experiments on various diffusion frameworks (ADM, DDPM/DDIM, EDM, LDM), unconditional and conditional settings, and deterministic vs. stochastic sampling verify the effectiveness of our method. The code is available at https://github.com/forever208/ADM-ES; https://github.com/forever208/EDM-ES
翻译:扩散模型已展现出令人瞩目的生成能力,但其“曝光偏差”问题——即训练与采样阶段输入不匹配的现象——尚未得到深入探究。本文通过解析建模采样分布,系统研究了扩散模型中的曝光偏差问题,据此将各采样步骤的预测误差归因于曝光偏差的根本成因。此外,我们探讨了该问题的潜在解决方案,并提出了一项直观的评估指标。在阐明曝光偏差的同时,我们提出了一种简单有效的免训练方法——Epsilon缩放,用以缓解曝光偏差。研究表明,通过缩减网络输出(Epsilon),Epsilon缩放使采样轨迹明确趋近于训练阶段学习的向量场,从而减轻训练与采样间的输入失配。在多种扩散框架(ADM、DDPM/DDIM、EDM、LDM)、无条件与有条件设定、以及确定性/随机性采样场景下的实验均验证了该方法的有效性。代码已开源:https://github.com/forever208/ADM-ES; https://github.com/forever208/EDM-ES