When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restrict the attention mechanism to apply only within each window, and re-use the positional embeddings across the windows. Our main results test the PCW approach on in-context learning with models that range in size between 750 million and 178 billion parameters, and show substantial improvements for tasks with diverse input and output spaces. We show additional benefits in other settings where long context windows may be beneficial: multi-hop questions and retrieval-augmented question answering with multiple retrieved documents. Our results highlight Parallel Context Windows as a promising method for applying off-the-shelf LLMs in a range of settings that require long text sequences. We make our code publicly available at https://github.com/ai21labs/parallel-context-windows.
翻译:在应用于处理长文本时,大语言模型受限于其上下文窗口。现有解决此局限的工作通常涉及训练专门架构,难以直接应用于现成大语言模型。我们提出并行上下文窗口——一种无需额外训练即可缓解任何现成大语言模型上下文窗口限制的方法。该方法的核心是将长文本切分为多个块(“窗口”),限制注意力机制仅在每个窗口内部应用,并在各窗口间重复使用位置嵌入。我们的主要实验测试了参数规模在7.5亿至1780亿之间的模型上采用并行上下文窗口方法进行上下文学习的效果,结果表明,该方法在输入和输出空间多样的任务上带来了显著提升。我们进一步展示了其在其他可能受益于长上下文窗口场景中的额外优势:多跳问题以及基于多篇检索文档增强的问答。我们的研究结果突出了并行上下文窗口作为一种有前景的方法,可在需要长文本序列的多种场景中应用现成大语言模型。我们已在 https://github.com/ai21labs/parallel-context-windows 公开代码。