Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuestions Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using various models including RoPE models, context windowextended models, and Alibi models, demonstrate the effectiveness and generalizability of our approach. Our method can improve performance by up to 15.2% by modifying just one dimension of hidden states. Our code is available at https://aka.ms/PositionalHidden.
翻译:大语言模型凭借其卓越的泛化能力和强大的生成能力,正日益广泛应用于各种现实场景。然而,它们表现出位置偏差,也称为"迷失在中间"现象,这在长上下文场景中尤为明显,表明将关键信息置于提示的不同位置会显著影响模型的准确性。本文首先探讨了位置偏差在微观层面的表现,得出结论:注意力权重是位置偏差的一种微观表达。研究进一步发现,除了位置嵌入外,因果注意力掩码也通过产生位置特定的隐藏状态,加剧了位置偏差。基于这些洞察,我们提出了一种通过缩放此类位置性隐藏状态来缓解位置偏差的方法。在NaturalQuestions多文档问答、KV检索、LongBench以及时间线重排序任务上的实验,使用了包括RoPE模型、上下文窗口扩展模型和Alibi模型在内的多种模型,验证了我们方法的有效性和泛化能力。我们的方法仅通过修改隐藏状态的一个维度,即可将性能提升高达15.2%。代码发布于 https://aka.ms/PositionalHidden。