We study the theoretical behavior of denoising score matching--the learning task associated to diffusion models--when the data distribution is supported on a low-dimensional manifold and the score is parameterized using a random feature neural network. We derive asymptotically exact expressions for the test, train, and score errors in the high-dimensional limit. Our analysis reveals that, for linear manifolds the sample complexity required to learn the score function scales linearly with the intrinsic dimension of the manifold, rather than with the ambient dimension. Perhaps surprisingly, the benefits of low-dimensional structure starts to diminish once we have a non-linear manifold. These results indicate that diffusion models can benefit from structured data; however, the dependence on the specific type of structure is subtle and intricate.
翻译:我们研究了去噪得分匹配的理论行为——这是与扩散模型相关的学习任务——当数据分布支撑在低维流形上,且得分函数使用随机特征神经网络进行参数化时。我们在高维极限下推导了测试误差、训练误差和得分误差的渐近精确表达式。我们的分析表明,对于线性流形,学习得分函数所需的样本复杂度与流形的本征维度线性相关,而非与环境维度相关。或许令人惊讶的是,一旦我们面对非线性流形,低维结构的优势便开始减弱。这些结果表明,扩散模型可以从结构化数据中获益;然而,其对特定结构类型的依赖是微妙且复杂的。