Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks.
翻译:长度泛化是指语言模型在处理比预训练时所见更长的输入时保持性能的能力。本研究提出了一种简单而有效的位置编码策略——随机浮点数采样,该策略能够很好地泛化到预训练或微调阶段未见过的长度。具体而言,RFS不使用预定义的离散集合中的位置索引,而是采用随机采样的连续值,通过在训练过程中让模型接触多样化的索引,从而避免在未见长度上出现分布外问题。由于为词元分配索引是广泛使用的位置编码方法中常见且基础的操作,RFS的优势可以轻松融入诸如绝对正弦编码、RoPE和ALiBi等编码方案。实验证实了该方法的有效性,结果表明RFS在长度泛化任务以及零样本常识推理基准测试中均取得了更优的性能。