Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effective approach referred to as Min-SNR-$\gamma$. This method adapts loss weights of timesteps based on clamped signal-to-noise ratios, which effectively balances the conflicts among timesteps. Our results demonstrate a significant improvement in converging speed, 3.4$\times$ faster than previous weighting strategies. It is also more effective, achieving a new record FID score of 2.06 on the ImageNet $256\times256$ benchmark using smaller architectures than that employed in previous state-of-the-art.
翻译:去噪扩散模型已成为图像生成的主流方法,然而,训练这些模型常常面临收敛缓慢的问题。在本文中,我们发现收敛缓慢的部分原因在于不同时间步之间存在相互冲突的优化方向。为解决这一问题,我们将扩散训练视为一个多任务学习问题,并引入一种名为 Min-SNR-$\gamma$ 的简单而有效的方法。该方法基于截断的信噪比来调整各时间步的损失权重,从而有效平衡各时间步之间的冲突。我们的结果表明,收敛速度显著提升,比之前的加权策略快 3.4 倍。该方法也更有效,在 ImageNet $256\times256$ 基准测试中,使用比先前最先进模型更小的架构,取得了 2.06 的创纪录 FID 分数。