Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.
翻译:相对位置编码广泛应用于原始Transformer和线性Transformer中,以表示位置信息。然而,原始Transformer的现有编码方法并不总是直接适用于线性Transformer,因为后者需要将查询和键表示分解为独立的核函数。尽管如此,适用于线性Transformer的编码方法设计原理仍未得到充分研究。在本工作中,我们将多种现有的线性相对位置编码方法统一到规范形式下,并进一步通过酉变换提出一系列线性相对位置编码算法。我们的公式化方法提供了一个原则性框架,可用于开发保持线性时空复杂度的新型相对位置编码方法。通过适配不同模型,所提出的线性化相对位置编码(LRPE)系列为各类应用生成了有效的编码。实验表明,与现有方法相比,LRPE在语言建模、文本分类和图像分类中达到了最先进的性能。同时,它强调了一种通用范式,用于设计更广泛适用于线性Transformer的相对位置编码方法。代码可在 https://github.com/OpenNLPLab/Lrpe 获取。