Swing-by Dynamics in Concept Learning and Compositional Generalization

Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, respecting the compositional hierarchy of the data-generating process. Moreover, concept-centric structures within the data significantly influence a model's speed of learning the ability to manipulate a concept. In this paper, we aim to better characterize these empirical results from a theoretical standpoint. Specifically, we propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights -- e.g., we find a novel mechanism for non-monotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models.

翻译：先前研究表明，文本条件扩散模型能够识别并操纵组合数据生成过程中的基本概念，从而实现对全新、分布外组合的泛化。除性能评估外，这些研究构建了丰富的学习动力学经验现象学，表明模型遵循数据生成过程的组合层次结构进行序列化泛化。此外，数据中以概念为中心的结构显著影响模型学习操纵概念能力的速度。本文旨在从理论角度进一步刻画这些经验结果。具体而言，我们通过引入结构化恒等映射任务，对先前研究的组合泛化问题进行抽象化建模：在该任务中，模型需学习具有结构组织质心的高斯混合分布上的恒等映射。我们通过数学方法分析在该SIM任务上训练的神经网络学习动力学，证明尽管任务形式简单，其学习动力学能够捕捉并解释先前研究中扩散模型组合泛化的关键经验观察。我们的理论还提供了若干新见解——例如，我们发现了训练早期阶段测试损失非单调学习动力学的新机制。通过训练文本条件扩散模型，我们验证了这些新预测，从而在简化框架与复杂生成模型之间建立桥梁。总体而言，本研究确立了SIM任务作为现代生成模型中概念学习动力学的有效理论抽象。