Advances in mechanistic interpretability have identified special attention heads, known as retrieval heads, that are responsible for retrieving information from the context. However, the role of these retrieval heads in improving model performance remains unexplored. This work investigates whether retrieval heads can be leveraged to enhance the long-context capabilities of LLMs. Specifically, we propose RetMask, a method that generates training signals by contrasting normal model outputs with those from an ablated variant in which the retrieval heads are masked. This mechanism-based approach achieves substantial improvements: +2.28 points on HELMET at 128K for Llama-3.1, with +70% gains on generation with citation and +32% on passage re-ranking, while preserving performance on general tasks. Experiments across three model families reveal that the effectiveness depends on retrieval head organization: models with concentrated patterns of retrieval heads respond strongly, while those with distributed patterns show limited gains. This mechanistic relationship validates the function of retrieval heads and demonstrates that mechanistic insights can be transformed into performance enhancements.
翻译:机制可解释性领域的进展已识别出一种特殊的注意力头,称为检索头,其负责从上下文中检索信息。然而,这些检索头在提升模型性能方面的作用仍未得到充分探索。本研究探讨了是否可以利用检索头来增强大语言模型的长上下文能力。具体而言,我们提出了RetMask方法,该方法通过对比正常模型输出与检索头被掩蔽的消融变体模型的输出,来生成训练信号。这种基于机制的方法取得了显著改进:在128K上下文长度下,Llama-3.1在HELMET基准上提升了2.28分,其中带引用的生成任务增益达+70%,段落重排序任务增益达+32%,同时保持了在通用任务上的性能。在三个模型系列上的实验表明,其有效性取决于检索头的组织模式:具有集中式检索头模式的模型响应强烈,而具有分布式模式的模型则增益有限。这种机制关联性验证了检索头的功能,并表明机制性见解可以转化为性能提升。