Advances in mechanistic interpretability have identified special attention heads, known as retrieval heads, that are responsible for retrieving information from the context. However, the role of these retrieval heads in improving model performance remains unexplored. This work investigates whether retrieval heads can be leveraged to enhance the long-context capabilities of LLMs. Specifically, we propose RetMask, a method that generates training signals by contrasting normal model outputs with those from an ablated variant in which the retrieval heads are masked. This mechanism-based approach achieves substantial improvements: +2.28 points on HELMET at 128K for Llama-3.1, with +70% gains on generation with citation and +32% on passage re-ranking, while preserving performance on general tasks. Experiments across three model families reveal that the effectiveness depends on retrieval head organization: models with concentrated patterns of retrieval heads respond strongly, while those with distributed patterns show limited gains. This mechanistic relationship validates the function of retrieval heads and demonstrates that mechanistic insights can be transformed into performance enhancements.
翻译:机制可解释性研究的进展识别出了一类特殊的注意力头,称为检索头,它们负责从上下文中检索信息。然而,这些检索头在提升模型性能方面的作用仍未得到探索。本研究探讨了是否可以利用检索头来增强大语言模型的长上下文能力。具体而言,我们提出了RetMask方法,该方法通过对比正常模型输出与一个经过消融处理的变体(其中检索头被屏蔽)的输出来生成训练信号。这种基于机制的方法取得了显著改进:在128K上下文长度下,Llama-3.1在HELMET基准上提升了2.28分,其中带引用的生成任务性能提升超过70%,段落重排序任务提升32%,同时保持通用任务性能不变。在三个模型系列上的实验表明,其有效性取决于检索头的组织模式:具有集中式检索头分布模式的模型响应强烈,而具有分布式模式的模型则增益有限。这种机制关联性验证了检索头的功能,并表明机制性洞见可以转化为性能提升。