Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional structure common in real data, such as that arising in natural images. In this work, we study the statistical convergence of score-based diffusion models for learning an unknown distribution $μ$ from finitely many samples. Under mild regularity conditions on the forward diffusion process and the data distribution, we derive finite-sample error bounds on the learned generative distribution, measured in the Wasserstein-$p$ distance. Unlike prior results, our guarantees hold for all $p \ge 1$ and require only a finite-moment assumption on $μ$, without compact-support, manifold, or smooth-density conditions. Specifically, given $n$ i.i.d.\ samples from $μ$ with finite $q$-th moment and appropriately chosen network architectures, hyperparameters, and discretization schemes, we show that the expected Wasserstein-$p$ error between the learned distribution $\hatμ$ and $μ$ scales as $\mathbb{E}\, \mathbb{W}_p(\hatμ,μ) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(μ)}\right),$ where $d^\ast_{p,q}(μ)$ is the $(p,q)$-Wasserstein dimension of $μ$. Our results demonstrate that diffusion models naturally adapt to the intrinsic geometry of data and mitigate the curse of dimensionality, since the convergence rate depends on $d^\ast_{p,q}(μ)$ rather than the ambient dimension. Moreover, our theory conceptually bridges the analysis of diffusion models with that of GANs and the sharp minimax rates established in optimal transport. The proposed $(p,q)$-Wasserstein dimension also extends classical Wasserstein dimension notions to distributions with unbounded support, which may be of independent theoretical interest.

翻译：尽管基于分数的扩散模型取得了显著的实证成功，但其统计理论保证仍有待发展。现有分析通常给出悲观的收敛速率，未能反映真实数据（如自然图像）中常见的本质低维结构。本文研究了基于分数的扩散模型从有限样本中学习未知分布 $μ$ 的统计收敛性。在前向扩散过程和数据分布满足温和正则性条件下，我们推导了学习到的生成分布在 Wasserstein-$p$ 距离下的有限样本误差界。与先前结果不同，我们的理论保证对所有 $p \ge 1$ 均成立，并且仅要求 $μ$ 具有有限矩，无需紧支撑、流形或光滑密度等条件。具体而言，给定来自具有有限 $q$ 阶矩的分布 $μ$ 的 $n$ 个独立同分布样本，并适当选择网络架构、超参数和离散化方案，我们证明学习分布 $\hatμ$ 与 $μ$ 之间的期望 Wasserstein-$p$ 误差满足 $\mathbb{E}\, \mathbb{W}_p(\hatμ,μ) = \widetilde{O}\!\left(n^{-1 / d^\ast_{p,q}(μ)}\right),$ 其中 $d^\ast_{p,q}(μ)$ 是 $μ$ 的 $(p,q)$-Wasserstein 维数。我们的结果表明，由于收敛速率依赖于 $d^\ast_{p,q}(μ)$ 而非环境维度，扩散模型能够自然地适应数据的内在几何结构并缓解维度灾难。此外，我们的理论在概念上连接了扩散模型的分析与生成对抗网络（GANs）的分析，以及最优传输中建立的尖锐极小极大速率。所提出的 $(p,q)$-Wasserstein 维数也将经典的 Wasserstein 维数概念推广到了具有无界支撑的分布，这可能具有独立的理论意义。