The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.
翻译:注意力机制是大语言模型(LLM)的关键组件,它允许序列中的标记相互交互,但具有顺序不变性。融入位置编码(PE)使其能够按位置进行寻址,例如关注第i个标记。然而,当前的位置编码方法使用标记计数来推导位置,因此无法泛化到更高的抽象层次,例如关注第i个句子。在本文中,我们提出了一种新的位置编码方法——上下文位置编码(CoPE),该方法允许位置以上下文为条件,仅对模型确定的特定标记递增位置。这使得更一般的位置寻址成为可能,例如关注第$i$个特定单词、名词或句子。我们证明,CoPE能够解决选择性复制、计数和Flip-Flop任务,而流行的位置嵌入方法在这些任务上会失败,并且它在语言建模和编码任务上降低了困惑度。