While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.
翻译:尽管扩散模型已成为一类强大的生成模型,但其学习动力学机制仍鲜为人知。我们首先通过实证研究表明,在自然图像上训练的标准扩散模型表现出分布简单性偏好:模型先学习简单的成对输入统计量,而后才专门学习高阶相关性。我们在基于最小数据模型——混合累积量模型——训练的简单去噪器中重现了这一行为,该模型允许我们精确控制输入的成对和高阶相关性。我们识别出模型的一个标量不变量,它控制着学习成对与高阶相关性所需的样本复杂度,我们将其称为扩散信息指数,以类比不同学习范式中相关的标量不变量。利用该不变量,我们证明了去噪器以线性样本复杂度学习输入的简单成对统计量,而更复杂的高阶统计量(如四阶累积量)至少需要立方样本复杂度。我们还证明,当成对统计量与高阶统计量共享相关潜在结构时,学习四阶累积量的样本复杂度是线性的。我们的工作揭示了扩散模型如何学习复杂度递增分布的一个关键机制。