Positional versus Symbolic Attention Heads: Learning Dynamics, RoPE Geometry, and Length Generalization

Transformer-based language models are widespread in today's society. As such, understanding the mechanisms by which they solve structured tasks and predicting how they may behave in novel scenarios is of great importance for safe deployment. We study the learning dynamics of attention heads in a controlled setting by training a decoder-only Transformer (GPT-J) on two structurally equivalent multi-hop reasoning tasks: a number task requiring positional reasoning and a letter task requiring symbolic reasoning. Using a recently introduced metric that classifies attention-head behavior as positional or symbolic for a given prompt, we show that successful learning is associated with the emergence of pure heads, i.e., heads that express themselves as either positional or symbolic. Despite the tasks' structural equivalence, they impose different mechanistic demands: the number task requires both positional and symbolic heads, whereas the letter task requires only symbolic heads. We then identify the computational roles of these heads, characterize the basic functions they implement, and give theoretical constructions showing how single-layer RoPE-based attention can realize these functions through geometrically interpretable query, key, and value operations. This analysis yields a quantitative separation between positional and symbolic mechanisms in their robustness to longer sequences, formalized through a novel notion of discrepancy. We empirically validate the resulting predictions in both controlled and real-world models, showing that symbolic mechanisms extrapolate more reliably to longer sequences while positional mechanisms face sharper limitations.

翻译：基于Transformer的语言模型在当今社会广泛应用。因此，理解它们解决结构化任务的机制并预测其在新型场景下的行为，对于安全部署具有重要意义。我们通过训练一个仅解码器架构的Transformer（GPT-J）执行两个结构等价的多跳推理任务——需要位置推理的数字任务和需要符号推理的字母任务，在受控环境中研究了注意力头的学习动态。利用近期提出的指标（该指标根据给定提示将注意力头行为分类为位置性或符号性），我们发现成功学习与纯化头的涌现相关，即那些要么表现位置性要么表现符号性的注意力头。尽管这两个任务在结构上等价，但施加的机制需求不同：数字任务需要位置性和符号性两种注意力头，而字母任务仅需符号性头。我们进一步识别了这些头的计算角色，表征了它们实现的基本函数，并给出了理论构造，证明基于旋转位置编码（RoPE）的单层注意力如何通过几何可解释的查询、键和值运算实现这些函数。这一分析在位置性与符号性机制对长序列的鲁棒性之间产生了定量分离，并通过新定义的差异概念形式化。我们通过受控模型和真实模型实证验证了这些预测，表明符号性机制对更长序列的外推更可靠，而位置性机制面临更严格的限制。