Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering

Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observations: (i) the low intrinsic dimensionality of image data, (ii) a union of manifold structure of image data, and (iii) the low-rank property of the denoising autoencoder in trained diffusion models. These observations motivate us to assume the underlying data distribution of image data as a mixture of low-rank Gaussians and to parameterize the denoising autoencoder as a low-rank model according to the score function of the assumed distribution. With these setups, we rigorously show that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples. Based on this equivalence, we further show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions under the above data and model assumptions. This insight sheds light on why diffusion models can break the curse of dimensionality and exhibit the phase transition in learning distributions. Moreover, we empirically establish a correspondence between the subspaces and the semantic representations of image data, facilitating image editing. We validate these results with corroborated experimental results on both simulated distributions and image datasets.

翻译：近期实证研究表明，扩散模型能够有效学习图像分布并生成新样本。值得注意的是，即使在高维图像空间中仅使用少量训练样本，这些模型仍能实现这一目标，从而规避了维度灾难问题。本研究通过利用以下关键实证观察为这一现象提供理论解释：(i) 图像数据的低本征维度特性，(ii) 图像数据的流形并集结构，以及(iii) 已训练扩散模型中降噪自编码器的低秩特性。这些观察促使我们将图像数据的基础分布假设为低秩高斯混合模型，并根据假设分布的评分函数将降噪自编码器参数化为低秩模型。在此设定下，我们严格证明了优化扩散模型的训练损失等价于对训练样本求解经典子空间聚类问题。基于该等价关系，我们进一步证明在上述数据和模型假设下，学习基础分布所需的最小样本数量与本征维度呈线性比例关系。这一发现揭示了扩散模型能够突破维度灾难并在分布学习中呈现相变现象的内在机理。此外，我们通过实验建立了图像数据子空间与语义表征之间的对应关系，为图像编辑提供了便利。我们在模拟分布和真实图像数据集上的实验均验证了上述结论。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/