Transformer has taken the natural language processing (NLP) field by storm since birth, owing to its superior ability to model complex dependencies in sequences. Despite the great success of pretrained language models (PLMs) based on Transformer across almost all NLP tasks, they all suffer from a preset length limit and thus can hardly extend this success to longer sequences beyond seen data, namely the length extrapolation problem. Length extrapolation has aroused great interest among researchers, as it is the core feature of human language capacity. To enhance length extrapolation of Transformers, a plethora of methods have been proposed, mostly focusing on extrapolatable position encodings. In this article, we provide an organized and systematical review of these research efforts in a unified notation from a position encoding perspective, aiming to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.
翻译:Transformer自诞生以来,凭借其在序列中建模复杂依赖关系的卓越能力,迅速席卷了自然语言处理领域。尽管基于Transformer的预训练语言模型在几乎所有自然语言处理任务上取得了巨大成功,但它们都受限于预设的长度限制,难以将这种成功扩展到超出训练数据长度的序列中,即长度外推问题。作为人类语言能力的核心特征,长度外推引起了研究者的广泛兴趣。为增强Transformer的长度外推能力,大量方法被提出,主要集中在可外推的位置编码上。本文从位置编码视角出发,以统一符号体系对这些研究成果进行了系统化梳理,旨在帮助读者深入理解现有方法,并为未来研究提供启发。