We study the theoretical behavior of denoising score matching--the learning task associated to diffusion models--when the data distribution is supported on a low-dimensional manifold and the score is parameterized using a random feature neural network. We derive asymptotically exact expressions for the test, train, and score errors in the high-dimensional limit. Our analysis reveals that, for linear manifolds the sample complexity required to learn the score function scales linearly with the intrinsic dimension of the manifold, rather than with the ambient dimension. Perhaps surprisingly, the benefits of low-dimensional structure starts to diminish once we have a non-linear manifold. These results indicate that diffusion models can benefit from structured data; however, the dependence on the specific type of structure is subtle and intricate.
翻译:我们研究了去噪评分匹配(扩散模型相关的学习任务)在数据分布位于低维流形且评分函数采用随机特征神经网络参数化时的理论行为。在高维极限下,我们推导了测试误差、训练误差和评分误差的渐近精确表达式。分析表明:对于线性流形,学习评分函数所需的样本复杂度与流形的本征维度呈线性关系,而非环境维度。令人意外的是,一旦面对非线性流形,低维结构带来的优势便开始减弱。这些结果表明扩散模型能从结构化数据中获益,但收益对具体结构类型的依赖关系微妙而复杂。