Long-context processing is a critical ability that constrains the applicability of large language models (LLMs). Although there exist various methods devoted to enhancing the long-context processing ability of LLMs, they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a Unified framework that reformulates existing long-context methods from the view of Memory augmentation of LLMs. Distinguished by its four core dimensions-Memory Management, Memory Writing, Memory Reading, and Memory Injection, UniMem empowers researchers to conduct systematic exploration of long-context methods. We re-formulate 16 existing methods based on UniMem and analyze four representative methods: Transformer-XL, Memorizing Transformer, RMT, and Longformer into equivalent UniMem forms to reveal their design principles and strengths. Based on these analyses, we propose UniMix, an innovative approach that integrates the strengths of these algorithms. Experimental results show that UniMix achieves superior performance in handling long contexts with significantly lower perplexity than baselines.
翻译:长上下文处理能力是制约大语言模型应用范围的关键特性。尽管已有多种方法致力于增强大语言模型的长上下文处理能力,但这些方法往往孤立发展,缺乏对其优势的系统性分析与整合,从而阻碍了该领域的进一步发展。本文提出UniMem——一个从大语言模型记忆增强视角重构现有长上下文方法的统一框架。该框架通过其四个核心维度(记忆管理、记忆写入、记忆读取与记忆注入)使研究者能够系统性地探索长上下文方法。我们基于UniMem重构了16种现有方法,并将Transformer-XL、Memorizing Transformer、RMT和Longformer四种代表性方法转化为等效的UniMem形式,以揭示其设计原理与优势。基于这些分析,我们进一步提出融合多种算法优势的创新方法UniMix。实验结果表明,UniMix在处理长上下文任务中表现出卓越性能,其困惑度显著低于基线模型。