Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.

翻译：作为一种高表达力的生成模型，扩散模型在图像生成、自然语言处理和组合优化等多个领域取得了显著成功。然而，随着数据分布日益复杂，训练这些模型达到收敛所需的计算量也越来越大。尽管扩散模型通常采用均匀时间步采样进行训练，但我们的研究表明，随机梯度的方差在不同时间步之间存在显著差异，其中高方差时间步成为阻碍更快收敛的瓶颈。为解决这一问题，我们提出了一种非均匀时间步采样方法，该方法优先考虑这些更为关键的时间步。我们的方法追踪每个时间步的梯度更新对目标函数的影响，自适应地选择那些最有可能有效最小化目标函数的时间步。实验结果表明，该方法不仅加速了训练过程，还提高了收敛时的性能。此外，我们的方法在多种数据集、调度策略和扩散架构上均表现出鲁棒的性能，优于先前提出的缺乏此等鲁棒性的时间步采样与加权启发式方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/