Through an uncertainty quantification (UQ) perspective, we show that score-based generative models (SGMs) are provably robust to the multiple sources of error in practical implementation. Our primary tool is the Wasserstein uncertainty propagation (WUP) theorem, a model-form UQ bound that describes how the $L^2$ error from learning the score function propagates to a Wasserstein-1 ($\mathbf{d}_1$) ball around the true data distribution under the evolution of the Fokker-Planck equation. We show how errors due to (a) finite sample approximation, (b) early stopping, (c) score-matching objective choice, (d) score function parametrization expressiveness, and (e) reference distribution choice, impact the quality of the generative model in terms of a $\mathbf{d}_1$ bound of computable quantities. The WUP theorem relies on Bernstein estimates for Hamilton-Jacobi-Bellman partial differential equations (PDE) and the regularizing properties of diffusion processes. Specifically, PDE regularity theory shows that stochasticity is the key mechanism ensuring SGM algorithms are provably robust. The WUP theorem applies to integral probability metrics beyond $\mathbf{d}_1$, such as the total variation distance and the maximum mean discrepancy. Sample complexity and generalization bounds in $\mathbf{d}_1$ follow directly from the WUP theorem. Our approach requires minimal assumptions, is agnostic to the manifold hypothesis and avoids absolute continuity assumptions for the target distribution. Additionally, our results clarify the trade-offs among multiple error sources in SGMs.
翻译:通过不确定性量化(UQ)的视角,我们证明了基于分数的生成模型(SGMs)对于实际实现中的多种误差来源具有可证明的鲁棒性。我们的主要工具是 Wasserstein 不确定性传播(WUP)定理,这是一个模型形式的不确定性量化界,描述了在学习分数函数时产生的 $L^2$ 误差,如何在 Fokker-Planck 方程的演化下,传播到真实数据分布周围的 Wasserstein-1($\mathbf{d}_1$)球中。我们展示了由以下因素引起的误差如何影响生成模型的质量,该质量由可计算量的 $\mathbf{d}_1$ 界来衡量:(a)有限样本近似,(b)早停,(c)分数匹配目标函数的选择,(d)分数函数参数化表达能力,以及(e)参考分布的选择。WUP 定理依赖于 Hamilton-Jacobi-Bellman 偏微分方程(PDE)的 Bernstein 估计以及扩散过程的正则化性质。具体而言,PDE 正则性理论表明,随机性是确保 SGM 算法具有可证明鲁棒性的关键机制。WUP 定理适用于超出 $\mathbf{d}_1$ 的积分概率度量,例如总变差距离和最大均值差异。$\mathbf{d}_1$ 度量下的样本复杂度和泛化界可直接从 WUP 定理得出。我们的方法需要最少的假设,对流形假设保持不可知论,并避免了对目标分布的绝对连续性假设。此外,我们的结果阐明了 SGMs 中多种误差源之间的权衡关系。