While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.
翻译:尽管扩散模型已成为一类强大的生成模型,其学习动态仍缺乏深入理解。我们首先通过实验证明,在自然图像上训练的标准扩散模型具有分布简单性偏好,即先学习简单的成对输入统计特征,再专门学习高阶相关性。我们在基于最小数据模型(混合累积量模型)训练的简单去噪器中复现了这一行为,该模型可精确控制输入的成对与高阶相关性。我们识别出一个标量不变量——扩散信息指数,它支配着学习成对和高阶相关性的样本复杂度,与不同学习范式中的相关不变量类似。利用该不变量,我们证明去噪器能以线性样本复杂度学习输入的简单成对统计量,而更复杂的高阶统计量(如四阶累积量)则至少需要立方样本复杂度。我们还证明,若成对和高阶统计量共享相关隐结构,则学习四阶累积量的样本复杂度可降至线性。我们的工作揭示了扩散模型如何学习递增复杂度分布的关键机制。