Score-based diffusion models have become the most popular approach to deep generative modeling of images, largely due to their empirical performance and reliability. Recently, a number of theoretical works \citep{chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear} have shown that diffusion models can efficiently sample, assuming $L^2$-accurate score estimates. The score-matching objective naturally approximates the true score in $L^2$, but the sample complexity of existing bounds depends \emph{polynomially} on the data radius and desired Wasserstein accuracy. By contrast, the time complexity of sampling is only logarithmic in these parameters. We show that estimating the score in $L^2$ \emph{requires} this polynomial dependence, but that a number of samples that scales polylogarithmically in the Wasserstein accuracy actually do suffice for sampling. We show that with a polylogarithmic number of samples, the ERM of the score-matching objective is $L^2$ accurate on all but a probability $\delta$ fraction of the true distribution, and that this weaker guarantee is sufficient for efficient sampling.
翻译:基于分数的扩散模型已成为图像深度生成建模中最常用的方法,这主要归功于其出色的实证表现和可靠性。近年来,一系列理论研究(chen2022, Chen2022ImprovedAO, Chenetal23flowode, benton2023linear)表明,在假设可获得$L^2$精确的分数估计条件下,扩散模型能够实现高效采样。虽然分数匹配目标函数在$L^2$范数下自然逼近真实分数,但现有复杂度上界对数据半径和期望的Wasserstein精度呈\emph{多项式}依赖。相比之下,采样时间复杂度仅与这些参数成对数关系。我们证明:在$L^2$范数下估计分数\emph{必然}需要这种多项式依赖,但若仅需在Wasserstein精度上达到多对数数量级的样本量,实际已经足够支撑采样过程。研究表明,当样本量达到多对数级别时,分数匹配目标的经验风险最小化(ERM)在真实分布上除了概率为$\delta$的局部区域外均能达到$L^2$精度,而这一弱化保证已足以实现高效采样。