Swing-by Dynamics in Concept Learning and Compositional Generalization

Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, respecting the compositional hierarchy of the data-generating process. Moreover, concept-centric structures within the data significantly influence a model's speed of learning the ability to manipulate a concept. In this paper, we aim to better characterize these empirical results from a theoretical standpoint. Specifically, we propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights -- e.g., we find a novel mechanism for non-monotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models.

翻译：先前的研究表明，文本条件扩散模型能够学习识别和操纵构成数据生成过程的原始概念，从而实现对全新、分布外组合的泛化。除了性能评估，这些研究还建立了学习动态的丰富经验现象学，表明模型能够按顺序泛化，并遵循数据生成过程的组合层次结构。此外，数据中以概念为中心的结构显著影响模型学习操纵概念能力的速度。在本文中，我们旨在从理论角度更好地刻画这些经验结果。具体而言，我们通过引入结构化恒等映射任务，对先前研究的组合泛化问题进行了抽象化处理。在该任务中，模型被训练学习具有结构组织质心的高斯混合分布上的恒等映射。我们数学分析了在该SIM任务上训练的神经网络的学习动态，并证明尽管任务简单，其学习动态能够捕捉并帮助解释先前工作中发现的扩散模型组合泛化关键经验观察。我们的理论还提供了若干新见解——例如，我们发现了训练早期阶段测试损失非单调学习动态的新机制。我们通过训练文本条件扩散模型验证了这些新预测，从而将简化框架与复杂生成模型联系起来。总体而言，本研究确立了SIM任务作为现代生成模型中概念学习动态的有意义理论抽象。