Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications. However, achieving long context lengths with the conventional transformer architecture requires substantial training and inference resources. In this paper, we present FocusLLM, a framework designed to extend the context length of any decoder-only LLM, enabling the model to focus on relevant information from very long sequences. FocusLLM processes long text inputs by dividing them into chunks based on the model's original context length to alleviate the issue of attention distraction. Then, it appends the local context to each chunk as a prompt to extract essential information from each chunk based on a novel parallel decoding mechanism, and ultimately integrates the extracted information into the local context. FocusLLM stands out for great training efficiency and versatility: trained with an 8K input length with much less training cost than previous methods, FocusLLM exhibits superior performance across downstream long-context tasks and maintains strong language modeling ability when handling extensive long texts, even up to 400K tokens. Our code is available at https://github.com/leezythu/FocusLLM.
翻译:赋能大语言模型从长上下文中提取有用信息的能力,对于许多下游应用至关重要。然而,使用传统的Transformer架构实现长上下文长度需要大量的训练和推理资源。本文提出FocusLLM,一个旨在扩展任何仅解码器架构大语言模型上下文长度的框架,使模型能够从极长序列中聚焦相关信息。FocusLLM通过将长文本输入根据模型原始上下文长度划分为多个块来处理,以缓解注意力分散问题。随后,它通过一种新颖的并行解码机制,将局部上下文作为提示附加到每个块,以从每个块中提取关键信息,并最终将提取的信息整合到局部上下文中。FocusLLM在训练效率和通用性方面表现突出:仅以8K输入长度进行训练,且训练成本远低于先前方法,FocusLLM在下游长上下文任务中展现出卓越性能,并在处理极长文本(甚至高达40万词元)时保持了强大的语言建模能力。我们的代码可在 https://github.com/leezythu/FocusLLM 获取。