We investigate the mechanisms that arise when transformers are trained to solve arithmetic on sequences where tokens are variables whose meaning is determined only through their interactions in-context. While prior work has studied transformers in settings where the answer relies on fixed parametric or geometric information encoded in token embeddings, we devise a new in-context reasoning task where the assignment of tokens to specific algebraic elements varies from one sequence to another. Despite this challenging setup, transformers achieve near-perfect accuracy on the task and even generalize to unseen groups. We develop targeted data distributions to create causal tests of a set of hypothesized mechanisms, and we isolate three mechanisms models consistently learn: commutative copying where a dedicated head copies answers, identity element recognition that distinguishes identity-containing facts, and closure-based cancellation that tracks group membership to constrain valid answers. Our findings show that the kinds of reasoning strategies learned by transformers are dependent on the task structure and that models can develop symbolic reasoning mechanisms when trained to reason in-context about variables whose meanings are not fixed.
翻译:本文研究了当Transformer模型被训练用于解决算术序列问题时产生的机制,这些序列中的标记是变量,其含义仅通过上下文中的交互来确定。尽管先前的研究探讨了Transformer在答案依赖于标记嵌入中编码的固定参数或几何信息的场景,我们设计了一种新的上下文推理任务,其中标记与特定代数元素的对应关系在不同序列间变化。尽管面临这种挑战性设置,Transformer在该任务上仍能达到近乎完美的准确率,甚至能泛化到未见过的群结构。我们通过设计有针对性的数据分布来创建对假设机制集的因果检验,并分离出模型持续学习的三种机制:专用注意力头复制答案的交换复制机制、识别包含单位元事实的恒等元辨识机制,以及通过追踪群成员资格来约束有效答案的基于闭包的对消机制。研究结果表明,Transformer学习到的推理策略类型取决于任务结构,当模型被训练在上下文中对含义不固定的变量进行推理时,能够发展出符号推理机制。