Diffusion models have achieved remarkable success in generative modeling. Despite more stable training, the loss of diffusion models is not indicative of absolute data-fitting quality, since its optimal value is typically not zero but unknown, leading to confusion between large optimal loss and insufficient model capacity. In this work, we advocate the need to estimate the optimal loss value for diagnosing and improving diffusion models. We first derive the optimal loss in closed form under a unified formulation of diffusion models, and develop effective estimators for it, including a stochastic variant scalable to large datasets with proper control of variance and bias. With this tool, we unlock the inherent metric for diagnosing the training quality of mainstream diffusion model variants, and develop a more performant training schedule based on the optimal loss. Moreover, using models with 120M to 1.5B parameters, we find that the power law is better demonstrated after subtracting the optimal loss from the actual training loss, suggesting a more principled setting for investigating the scaling law for diffusion models.
翻译:扩散模型在生成式建模中取得了显著成功。尽管训练过程更为稳定,但扩散模型的损失值并不能直接反映数据拟合质量,因为其最优值通常非零且未知,这导致将较大的最优损失误判为模型能力不足。本文提出,通过估计最优损失值来诊断和改进扩散模型十分必要。我们首先在扩散模型的统一框架下推导出最优损失的闭式解,并开发了有效的估计器,包括一种可扩展至大规模数据集、并能合理控制方差与偏差的随机变体。借助这一工具,我们揭示了主流扩散模型变体训练质量的内在度量,并基于最优损失开发出更高效的训练策略。此外,通过使用参数规模从1.2亿到15亿的模型,我们发现从实际训练损失中减去最优损失后,幂律关系得以更清晰地体现,这为研究扩散模型的扩展定律提供了更严谨的理论基础。