Denoising Task Difficulty-based Curriculum for Training Diffusion Models

Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with descending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging the benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.

翻译：基于扩散的生成模型已成为生成模型领域的有力工具。尽管已有大量关于不同时间步和噪声水平下去噪过程的研究，但关于去噪任务相对难度的争议仍然存在。部分研究认为较低时间步的任务更具挑战性，而另一些研究则认为较高时间步的任务更为困难。为解决这一争议，本研究对任务难度进行了全面考察，重点分析了不同时间步上连续概率分布之间的收敛行为及相对熵变化。我们的观察性研究表明，早期时间步的去噪过程具有收敛速度慢、相对熵高的特点，表明这些较低时间步的任务难度更大。基于这些发现，我们借鉴课程学习思想，引入了一种由易到难的学习方案，以优化扩散模型的训练过程。通过将时间步或噪声水平进行聚类，并按照难度降序训练模型，我们实现了一种顺序感知的训练范式，即从较易到较难的去噪任务逐步推进，这与传统上对所有时间步同时训练扩散模型的方法有所不同。我们的方法通过利用课程学习的优势，在保持与现有扩散训练技术改进正交性的同时，提升了模型性能并加速了收敛。我们通过在图像生成任务（包括无条件生成、类别条件生成和文本到图像生成）中的全面实验验证了这些优势。