Diffusion models have been successfully adapted to text generation tasks by mapping the discrete text into the continuous space. However, there exist nonnegligible gaps between training and inference, owing to the absence of the forward process during inference. Thus, the model only predicts based on the previously generated reverse noise rather than the noise computed by the forward process. Besides, the widely-used downsampling strategy in speeding up the inference will cause the mismatch of diffusion trajectories between training and inference. To understand and mitigate the above two types of training-inference discrepancies, we launch a thorough preliminary study. Based on our observations, we propose two simple yet effective methods to bridge the gaps mentioned above, named Distance Penalty and Adaptive Decay Sampling. Extensive experiments on \textbf{6} generation tasks confirm the superiority of our methods, which can achieve $100\times \rightarrow 200\times$ speedup with better performance.
翻译:扩散模型通过将离散文本映射到连续空间,已成功应用于文本生成任务。然而,由于推理过程中前向过程的缺失,训练与推理之间存在不可忽视的差距。因此,模型仅基于先前生成的反向噪声进行预测,而非由前向过程计算得到的噪声。此外,推理加速中广泛采用的下采样策略会导致训练与推理间扩散轨迹的不匹配。为理解并缓解上述两类训练-推理不一致性,我们开展了全面的预研究。基于观察结果,我们提出两种简单有效的方法来弥合上述差距,分别命名为距离惩罚(Distance Penalty)与自适应衰减采样(Adaptive Decay Sampling)。在\textbf{6}项生成任务上的大量实验证实了所提方法的优越性,其可在实现$100\times \rightarrow 200\times$加速的同时获得更优性能。