Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting that qualitatively agree with the ones used in [Karras et al. 2022]. It also provides some perspectives on why the time and variance schedule used in [Karras et al. 2022] could be better tuned than the pioneering version in [Song et al. 2020].
翻译:现有大多数关于扩散模型准确性的理论研究,尽管意义重大,但都假设评分函数已被近似到一定精度,然后利用这一先验界来控制生成的误差。本文则首次对生成过程的整体,即训练与采样两者,提供了定量理解。更确切地说,本文对梯度下降下的去噪评分匹配进行了非渐近收敛性分析。此外,还提供了针对方差爆炸模型的精细化采样误差分析。结合这两项结果,我们得到了完整的误差分析,从而(再次,但此次是从理论上)阐明了如何设计训练和采样过程以实现有效生成。例如,我们的理论暗示了对噪声分布和损失加权的偏好,其在定性与[Karras et al. 2022]中使用的方案一致。该理论也为理解[Karras et al. 2022]中使用的时间和方差调度方案为何可能比[Song et al. 2020]中的开创性版本更优提供了一些视角。