InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process long sequences with a limited context window and well capture long-distance dependencies. Without any training, InfLLM enables LLMs that are pre-trained on sequences consisting of a few thousand tokens to achieve comparable performance with competitive baselines that continually train these LLMs on long sequences. Even when the sequence length is scaled to $1,024$K, InfLLM still effectively captures long-distance dependencies. Our code can be found in \url{https://github.com/thunlp/InfLLM}.

翻译：大语言模型已成为处理长流式输入（如LLM驱动智能体）实际应用的基石。然而，现有LLM在受限最大长度序列上预训练后，因领域外和注意力分散问题而无法处理更长序列。常见解决方案通常涉及对长序列进行持续预训练，这会引入昂贵的计算开销和不可控的模型能力变化。本文揭示了LLM无需微调即可理解超长序列的内在能力。为此，我们提出一种基于记忆的免训练方法InfLLM。具体而言，InfLLM将远端上下文存储至额外记忆单元，并采用高效机制查找与词元相关的单元以进行注意力计算。由此，InfLLM使LLM能够通过有限上下文窗口高效处理长序列，并有效捕捉长距离依赖关系。在完全免训练条件下，InfLLM使仅基于数千词元序列预训练的LLM，能够达到与对长序列持续训练的竞争基线模型相当的性能。即使序列长度扩展至$1,024$K，InfLLM仍能有效捕获长距离依赖。代码发布于\url{https://github.com/thunlp/InfLLM}。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日