This work elicits LLMs' inherent ability to handle long contexts without fine-tuning. The limited length of the training sequence during training may limit the application of Large Language Models (LLMs) on long input sequences for inference. In this work, we argue that existing LLMs themselves have inherent capabilities for handling long contexts. Based on this argument, we suggest extending LLMs' context window by themselves to fully utilize the inherent ability.We propose Self-Extend to stimulate LLMs' long context handling potential. The basic idea is to construct bi-level attention information: the group level and the neighbor level. The two levels are computed by the original model's self-attention, which means the proposed does not require any training. With only four lines of code modification, the proposed method can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments and the results show that the proposed method can effectively extend existing LLMs' context window's length.
翻译:本工作揭示了大型语言模型(LLMs)在无需微调的情况下处理长序列的固有能力。训练时序列长度的限制可能制约LLMs在推理阶段处理长输入序列的应用。本文提出,现有LLMs自身已具备处理长上下文的内在能力,并基于此论点建议通过模型自身扩展上下文窗口以充分利用该能力。我们提出Self-Extend方法以激发LLMs的长上下文处理潜力。其核心思想是构建双层注意力信息:组级注意力与邻级注意力。这两层信息均由原始模型的自注意力机制计算,这意味着所提方法无需任何训练。仅通过四行代码修改,该方法便可在无需微调的前提下轻松扩展现有LLMs的上下文窗口。我们进行了全面实验,结果表明所提方法能有效延长现有LLMs的上下文窗口长度。