Diffusion models achieve state-of-the-art performance in various generation tasks. However, their theoretical foundations fall far behind. This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. Our result provides sample complexity bounds for distribution estimation using diffusion models. We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated. Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution. The convergence rate depends on the subspace dimension, indicating that diffusion models can circumvent the curse of data ambient dimensionality.
翻译:扩散模型在多种生成任务中展现出最先进的性能,但其理论基础仍远远落后。本文研究了当数据位于未知低维线性子空间上时,扩散模型的分数逼近、估计与分布恢复问题。我们给出了使用扩散模型进行分布估计的样本复杂度界。研究表明,通过选择适当的神经网络架构,分数函数既能被精确逼近,也能被高效估计。此外,基于估计的分数函数生成的分布能够捕捉数据几何结构,并收敛至数据分布的邻近区域。收敛速率取决于子空间维度,这表明扩散模型可以规避数据环境维度的灾难。