Induction heads are attention heads that perform inductive copying by matching patterns from earlier context and copying their continuations verbatim. As models develop induction heads, they experience a sharp drop in training loss, a phenomenon cited as evidence that induction heads may underlie a wide range of in-context learning (ICL) capabilities. In this work, we investigate whether induction heads are a necessary building block for learning abstractive ICL capabilities (i.e., tasks where the answer is not contained in the input context), or whether such capabilities can emerge independently. We propose Hapax, a training regime that omits the loss contribution of tokens predictable by induction heads. Despite a significant reduction in inductive copying, abstractive ICL capabilities are preserved, with the model achieving higher accuracy than the vanilla model on 13 out of 21 tasks, even though 31.7% of tokens are omitted from the loss. Furthermore, our model achieves lower loss values on token positions that induction heads cannot predict. Mechanistic analysis shows that models trained with Hapax develop fewer and weaker induction heads despite preserving abstractive ICL capabilities. Our findings suggest that the developmental link between induction heads and abstractive ICL capabilities is weaker than previously hypothesized.
翻译:归纳头是一种通过匹配先前上下文中的模式并逐字复制其后续内容来执行归纳复制的注意力头。随着模型发展出归纳头,其训练损失会急剧下降,这一现象被引证为归纳头可能构成广泛上下文学习能力基础的证据。在本研究中,我们探究归纳头是否是习得抽象上下文学习能力(即答案不包含在输入上下文中的任务)的必要构建模块,抑或此类能力能否独立涌现。我们提出Hapax训练机制,该机制剔除了可通过归纳头预测的标记对损失函数的贡献。尽管归纳复制显著减少,抽象上下文学习能力仍得以保留——在21项任务中的13项上,该模型比原始模型获得更高准确率,即使有31.7%的标记从损失计算中被剔除。此外,我们的模型在归纳头无法预测的标记位置上实现了更低的损失值。机理分析表明,采用Hapax训练的模型尽管保留了抽象上下文学习能力,但形成的归纳头数量更少且强度更弱。我们的研究结果表明,归纳头与抽象上下文学习能力之间的发展关联弱于先前假设。