Diffusion models have emerged as the de facto choice for generating visual signals. However, training a single model to predict noise across various levels poses significant challenges, necessitating numerous iterations and incurring significant computational costs. Various approaches, such as loss weighting strategy design and architectural refinements, have been introduced to expedite convergence. In this study, we propose a novel approach to design the noise schedule for enhancing the training of diffusion models. Our key insight is that the importance sampling of the logarithm of the Signal-to-Noise ratio (logSNR), theoretically equivalent to a modified noise schedule, is particularly beneficial for training efficiency when increasing the sample frequency around $\log \text{SNR}=0$. We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule. Furthermore, we highlight the advantages of our noise schedule design on the ImageNet benchmark, showing that the designed schedule consistently benefits different prediction targets.
翻译:扩散模型已成为生成视觉信号的事实标准方法。然而,训练单一模型以预测不同噪声水平下的信号仍面临重大挑战,需要大量迭代次数并产生显著计算成本。为加速收敛,研究者已提出多种方法,如损失加权策略设计与架构改进。本研究提出一种设计噪声调度策略的新方法,以增强扩散模型的训练效果。我们的核心观点是:对数信噪比(logSNR)的重要性采样在理论上等效于改进的噪声调度策略,当在$\log \text{SNR}=0$附近增加采样频率时,对训练效率尤为有益。我们通过实验证明了所提噪声调度策略优于标准余弦调度。此外,我们在ImageNet基准测试中验证了该噪声调度设计的优势,表明所设计的调度策略能持续提升不同预测目标的效果。