Position encoding recently has shown effective in the transformer architecture. It enables valuable supervision for dependency modeling between elements at different positions of the sequence. In this paper, we first investigate various methods to integrate positional information into the learning process of transformer-based language models. Then, we propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. Specifically, the proposed RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. Notably, RoPE enables valuable properties, including the flexibility of sequence length, decaying inter-token dependency with increasing relative distances, and the capability of equipping the linear self-attention with relative position encoding. Finally, we evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets. Our experiments show that it consistently overcomes its alternatives. Furthermore, we provide a theoretical analysis to explain some experimental results. RoFormer is already integrated into Huggingface: \url{https://huggingface.co/docs/transformers/model_doc/roformer}.
翻译:位置编码近年来在Transformer架构中展现出有效性,它为序列中不同位置元素之间的依赖关系建模提供了重要监督。本文首先探讨了多种将位置信息集成到基于Transformer的语言模型学习过程中的方法。随后,我们提出了一种名为旋转位置编码(RoPE)的新方法,以有效利用位置信息。具体而言,所提出的RoPE通过旋转矩阵对绝对位置进行编码,同时将显式的相对位置依赖关系融入自注意力机制中。值得注意的是,RoPE具备多项重要特性,包括序列长度的灵活性、随相对距离增加而衰减的令牌间依赖关系,以及为线性自注意力配备相对位置编码的能力。最后,我们在多个长文本分类基准数据集上对采用旋转位置编码的增强型Transformer(亦称为RoFormer)进行了评估。实验结果表明,该模型始终优于其他替代方案。此外,我们通过理论分析解释了部分实验结果。RoFormer已集成至Huggingface:\url{https://huggingface.co/docs/transformers/model_doc/roformer}。