We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles, we measure the minimum context length (MCL) needed to reproduce accurate full-context predictions across datasets with sequences of varying lengths. For sequences with 1-7k tokens from long-context documents, we consistently find that 75-80% require only the last 96 tokens at most. Given the dominance of short-context tokens, we then ask whether it is possible to detect challenging long-context sequences for which a short local prefix does not suffice for prediction. We introduce a practical proxy to MCL, called Distributionally Aware MCL (DaMCL), that does not require knowledge of the actual next-token and is compatible with sampling strategies beyond greedy decoding. Our experiments validate that simple thresholding of the metric defining DaMCL achieves high performance in detecting long vs. short context sequences. Finally, to counter the bias that short-context dominance induces in LLM output distributions, we develop an intuitive decoding algorithm that leverages our detector to identify and boost tokens that are long-range-relevant. Across Q&A tasks and model architectures, we confirm that mitigating the bias improves performance.
翻译:本研究探讨短上下文主导性假设:对于大多数序列而言,仅需少量局部前缀即可预测其后续词元。通过将大语言模型作为统计预测器,我们测量了在不同长度序列的数据集上复现准确全上下文预测所需的最小上下文长度。针对来自长上下文文档的1-7k词元序列,我们一致发现75-80%的序列最多仅需最后96个词元。鉴于短上下文词元的主导地位,我们进一步探究是否可能检测具有挑战性的长上下文序列——即仅凭短局部前缀不足以进行预测的情况。我们提出一种称为分布感知最小上下文长度的实用代理指标,该指标无需实际后续词元的先验知识,且兼容贪婪解码之外的采样策略。实验验证表明,通过对该度量指标进行简单阈值处理,能高效区分长上下文与短上下文序列。最后,为纠正短上下文主导性对大语言模型输出分布造成的偏差,我们开发了一种直观的解码算法,该算法利用检测器识别并增强与长距离上下文相关的词元。在问答任务和多种模型架构上的实验证实,缓解此类偏差能有效提升模型性能。