Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with ascending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.
翻译:基于扩散的生成模型已成为生成建模领域中的强大工具。尽管针对不同时间步和噪声水平的去噪任务已有广泛研究,但关于去噪任务相对难度的争议持续存在。多项研究认为较低时间步的任务更具挑战性,而其他研究则主张较高时间步的任务更为困难。为解决这一争议,本研究对任务难度进行了全面考察,重点关注收敛行为以及时间步之间连续概率分布的相对熵变化。我们的观察研究表明,早期时间步的去噪任务呈现出收敛速度较慢、相对熵较高的挑战特征,表明这些较低时间步的任务难度更大。基于这些观察,我们借鉴课程学习思想,引入了一种由易到难的学习方案来改进扩散模型的训练过程。通过将时间步或噪声水平聚类并按难度升序组织训练,我们构建了顺序感知的训练机制——从较易的去噪任务逐步过渡到较难任务,从而突破了传统扩散模型在所有时间步同步训练的模式。该方法通过利用课程学习的优势,在保持与现有扩散训练技术改进正交性的同时,实现了性能提升与加速收敛。我们在图像生成任务(包括无条件生成、类条件生成和文本到图像生成)中通过系统实验验证了这些优势。