Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, respecting the compositional hierarchy of the data-generating process. Moreover, concept-centric structures within the data significantly influence a model's speed of learning the ability to manipulate a concept. In this paper, we aim to better characterize these empirical results from a theoretical standpoint. Specifically, we propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights -- e.g., we find a novel mechanism for non-monotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models.
翻译:先前的研究表明,基于文本条件的扩散模型能够学习识别和操纵构成数据生成过程的原始概念,从而实现对全新、分布外组合的泛化。除了性能评估,这些研究还建立了关于学习动力学的丰富经验现象学,表明模型按照数据生成过程的组合层次结构进行顺序泛化。此外,数据中以概念为中心的结构显著影响模型学习操纵某个概念的速度。在本文中,我们旨在从理论角度更好地刻画这些经验结果。具体而言,我们通过引入结构化恒等映射任务,对先前工作中组合泛化问题进行了抽象化:在该任务中,模型被训练学习一个具有结构化组织质心的高斯混合分布上的恒等映射。我们对在此SIM任务上训练的神经网络的学习动力学进行了数学分析,并证明尽管任务简单,SIM的学习动力学能够捕捉并帮助解释先前工作中发现的关于扩散模型组合泛化的关键经验观察。我们的理论还提供了若干新见解——例如,我们发现了训练早期测试损失非单调学习动力学的一种新机制。我们通过训练一个基于文本条件的扩散模型验证了这些新预测,从而将我们的简化框架与复杂生成模型联系起来。总体而言,本工作确立了SIM任务作为现代生成模型中概念学习动力学的有意义的理论抽象。