In recent years, diffusion models, and more generally score-based deep generative models, have achieved remarkable success in various applications, including image and audio generation. In this paper, we view diffusion models as an implicit approach to nonparametric density estimation and study them within a statistical framework to analyze their surprising performance. A key challenge in high-dimensional statistical inference is leveraging low-dimensional structures inherent in the data to mitigate the curse of dimensionality. We assume that the underlying density exhibits a low-dimensional structure by factorizing into low-dimensional components, a property common in examples such as Bayesian networks and Markov random fields. Under suitable assumptions, we demonstrate that an implicit density estimator constructed from diffusion models adapts to the factorization structure and achieves the minimax optimal rate with respect to the total variation distance. In constructing the estimator, we design a sparse weight-sharing neural network architecture, where sparsity and weight-sharing are key features of practical architectures such as convolutional neural networks and recurrent neural networks.
翻译:近年来,扩散模型以及更广义的基于分数的深度生成模型在图像与音频生成等众多应用中取得了显著成功。本文将扩散模型视为非参数密度估计的一种隐式方法,并在统计框架下对其进行分析,以探究其卓越性能背后的机理。高维统计推断的一个核心挑战在于如何利用数据固有的低维结构以缓解维度灾难问题。我们假设潜在密度具有低维结构,即能够分解为若干低维分量——这一特性常见于贝叶斯网络与马尔可夫随机场等实例。在适当假设下,我们证明由扩散模型构建的隐式密度估计器能够自适应地捕捉分解结构,并在全变差距离度量下达到极小极大最优收敛速率。在构建估计器的过程中,我们设计了一种稀疏权重共享的神经网络架构,其中稀疏性与权重共享正是卷积神经网络与循环神经网络等实用架构的关键特征。