Assign and Add: A Mechanistic Study of Compositional Arithmetic

Large language models are able to compose skills in order to perform complex tasks, many of which might not have been seen during training. The details of how exactly this composition occurs remain elusive. In this paper, we study a mechanism for compositional generalization in transformers by considering a simple controlled setting involving variable assignment and modular addition. By partitioning our training data into disjoint sets, we observe that small transformers are able to generalize to previously unseen combinations of variables and numbers. Our mechanistic analysis shows that the same ``modular addition'' MLP module is used whether the inputs are given directly or indirectly through a separate variable assignment mechanism. We also analyze the training dynamics from an empirical lens, which reveals three phases of learning: first, modular addition is learned, then the structure required for variable assignment, and finally a refinement phase where the model generalizes to some hard sequences not seen in training. Finally, we provide a theoretical framework to explain how compositionality emerges from training dynamics. These results suggest that compositional generalization can be a natural consequence of the compositionality of internal mechanisms in~transformers.

翻译：摘要：大型语言模型能够组合多种技能以完成复杂任务，其中许多任务可能在训练中未曾出现。然而，这种组合发生的具体细节仍难以捉摸。本文通过考虑一个涉及变量赋值和模加法的简单受控场景，研究Transformer中组合泛化的机制。通过将训练数据划分为不相交的集合，我们观察到小型Transformer能够泛化到先前未见过的变量与数字组合。我们的机制分析表明，无论输入是直接给出还是通过单独的变量赋值机制间接给出，相同的“模加法”MLP模块都会被使用。我们还从经验角度分析了训练动态，揭示了学习的三个阶段：首先学习模加法，然后学习变量赋值所需的结构，最后进入精炼阶段，此时模型泛化到训练中未见的一些困难序列。最后，我们提出了一个理论框架来解释组合性如何从训练动态中涌现。这些结果表明，组合泛化可能是Transformer内部机制组合性的自然结果。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

组合优化赋能的机器学习：技术基础、应用场景与研究前沿

专知会员服务

25+阅读 · 1月16日