Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the same ``modular addition'' MLP module is used whether the inputs are given directly or indirectly through a separate variable assignment mechanism. We also analyze the training dynamics from an empirical lens, which reveals three phases of learning: first, modular addition is learned, then the structure required for variable assignment, and finally a refinement phase where the model generalizes to some hard sequences not seen in training. Finally, we provide a theoretical framework to explain how compositionality emerges from training dynamics. These results suggest that compositional generalization can be a natural consequence of the compositionality of internal mechanisms in~transformers.
翻译:摘要:大型语言模型能够组合多种技能以完成复杂任务,其中许多任务可能在训练中未曾出现。然而,这种组合发生的具体细节仍难以捉摸。本文通过考虑一个涉及变量赋值和模加法的简单受控场景,研究Transformer中组合泛化的机制。通过将训练数据划分为不相交的集合,我们观察到小型Transformer能够泛化到先前未见过的变量与数字组合。我们的机制分析表明,无论输入是直接给出还是通过单独的变量赋值机制间接给出,相同的“模加法”MLP模块都会被使用。我们还从经验角度分析了训练动态,揭示了学习的三个阶段:首先学习模加法,然后学习变量赋值所需的结构,最后进入精炼阶段,此时模型泛化到训练中未见的一些困难序列。最后,我们提出了一个理论框架来解释组合性如何从训练动态中涌现。这些结果表明,组合泛化可能是Transformer内部机制组合性的自然结果。