This study reports an unintuitive finding that positional encoding enhances learning of recurrent neural networks (RNNs). Positional encoding is a high-dimensional representation of time indices on input data. Most famously, positional encoding complements the capabilities of Transformer neural networks, which lack an inherent mechanism for representing the data order. By contrast, RNNs can encode the temporal information of data points on their own, rendering their use of positional encoding seemingly redundant/unnecessary. Nonetheless, investigations through synthetic benchmarks reveal an advantage of coupling positional encoding and RNNs, especially for handling a large vocabulary that yields low-frequency tokens. Further scrutinization unveils that these low-frequency tokens destabilizes the gradients of vanilla RNNs, and the positional encoding resolves this instability. These results shed a new light on the utility of positional encoding beyond its canonical role as a timekeeper for Transformers.
翻译:本研究报告了一个反直觉的发现:位置编码能够增强循环神经网络(RNNs)的学习能力。位置编码是输入数据时间索引的高维表示。最为人熟知的是,位置编码弥补了Transformer神经网络在表示数据顺序方面固有机制的缺失。相比之下,RNNs本身能够编码数据点的时间信息,这使得它们使用位置编码看似冗余或不必要。然而,通过合成基准测试的研究发现,将位置编码与RNNs结合具有优势,尤其是在处理会产生低频标记的大规模词汇表时。进一步分析揭示,这些低频标记会破坏普通RNNs的梯度稳定性,而位置编码能够解决这种不稳定性。这些结果为位置编码的效用提供了新的见解,超越了其作为Transformer时间保持器的传统角色。