Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval. MemLong combines a non-differentiable ``ret-mem'' module with a partially trainable decoder-only language model and introduces a fine-grained, controllable retrieval attention mechanism that leverages semantic-level relevant chunks. Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs. More importantly, MemLong can extend the context length on a single 3090 GPU from 4k up to 80k. Our code is available at https://github.com/Bui1dMySea/MemLong
翻译:近年来,大语言模型(LLMs)在多个领域取得了显著成功。然而,由于注意力机制具有二次时间与空间复杂度,且生成过程中键值缓存的存储消耗不断增长,处理长上下文仍然是LLMs面临的重要挑战。本文提出MemLong:面向长文本生成的记忆增强检索方法,该方法通过外部检索器获取历史信息,旨在增强长上下文语言建模能力。MemLong将不可微的“ret-mem”模块与部分可训练的仅解码器语言模型相结合,并引入细粒度可控的检索注意力机制,该机制能够利用语义层面的相关文本块。在多个长上下文语言建模基准上的综合评估表明,MemLong持续优于其他最先进的LLMs。更重要的是,MemLong可在单张3090 GPU上将上下文长度从4k扩展至80k。我们的代码公开于https://github.com/Bui1dMySea/MemLong。