Recent work has identified a subset of attention heads in Transformer as retrieval heads, which are responsible for retrieving information from the context. In this work, we first investigate retrieval heads in multilingual contexts. In multilingual language models, we find that retrieval heads are often shared across multiple languages. Expanding the study to cross-lingual setting, we identify Retrieval-Transition heads(RTH), which govern the transition to specific target-language output. Our experiments reveal that RTHs are distinct from retrieval heads and more vital for Chain-of-Thought reasoning in multilingual LLMs. Across four multilingual benchmarks (MMLU-ProX, MGSM, MLQA, and XQuaD) and two model families (Qwen-2.5 and Llama-3.1), we demonstrate that masking RTH induces bigger performance drop than masking Retrieval Heads (RH). Our work advances understanding of multilingual LMs by isolating the attention heads responsible for mapping to target languages.
翻译:近期研究已识别出Transformer中一类特殊的注意力头,即检索头,其功能是从上下文中检索信息。本研究首先探究了多语言语境下的检索头机制。在多语言模型中,我们发现检索头通常在不同语言间共享。进一步拓展至跨语言场景,我们识别出检索-转换头,该机制主导着向特定目标语言输出的转换过程。实验表明,检索-转换头与检索头具有本质区别,且对多语言大语言模型中的思维链推理更为关键。通过在四个多语言基准测试(MMLU-ProX、MGSM、MLQA和XQuaD)及两个模型系列(Qwen-2.5与Llama-3.1)上的验证,我们证明屏蔽检索-转换头比屏蔽检索头会导致更显著的性能下降。本研究通过分离负责目标语言映射的注意力头,深化了对多语言语言模型工作机制的理解。