Positional Encodings (PEs) are used to inject word-order information into transformer-based language models. While they can significantly enhance the quality of sentence representations, their specific contribution to language models is not fully understood, especially given recent findings that various positional encodings are insensitive to word order. In this work, we conduct a systematic study of positional encodings in \textbf{Bidirectional Masked Language Models} (BERT-style) , which complements existing work in three aspects: (1) We uncover the core function of PEs by identifying two common properties, Locality and Symmetry; (2) We show that the two properties are closely correlated with the performances of downstream tasks; (3) We quantify the weakness of current PEs by introducing two new probing tasks, on which current PEs perform poorly. We believe that these results are the basis for developing better PEs for transformer-based language models. The code is available at \faGithub~ \url{https://github.com/tigerchen52/locality\_symmetry}
翻译:位置编码(PEs)被用于向基于Transformer的语言模型中注入词序信息。尽管它们能显著提升句子表示的质量,但其对语言模型的具体贡献尚未完全明确,尤其是近期研究发现多种位置编码对词序不敏感。本文对**双向掩码语言模型**(BERT式)中的位置编码进行了系统研究,从三个方面补充现有工作:(1)通过识别位置编码的两个共同特性——局部性与对称性,揭示其核心功能;(2)证明这两个特性与下游任务性能密切相关;(3)通过提出两项新的探究任务,量化当前位置编码的缺陷——现有位置编码在这些任务上表现欠佳。我们认为这些结果为开发基于Transformer的语言模型更优的位置编码奠定了基础。代码已开源发布于\faGithub~ \url{https://github.com/tigerchen52/locality_symmetry}