To extend the context length of Transformer-based large language models (LLMs) and improve comprehension capabilities, we often face limitations due to computational resources and bounded memory storage capacity. This work introduces a method called Recurrent Context Compression (RCC), designed to efficiently expand the context window length of LLMs within constrained storage space. We also investigate the issue of poor model responses when both instructions and context are compressed in downstream tasks, and propose an instruction reconstruction method to mitigate this problem. We validated the effectiveness of our approach on multiple tasks, achieving a compression rate of up to 32x on text reconstruction tasks with a BLEU4 score close to 0.95, and nearly 100\% accuracy on a passkey retrieval task with a sequence length of 1M. Finally, our method demonstrated competitive performance in long-text question-answering tasks compared to non-compressed methods, while significantly saving storage resources in long-text inference tasks. Our code, models, and demo are available at https://github.com/WUHU-G/RCC_Transformer
翻译:为扩展基于Transformer的大语言模型(LLMs)的上下文长度并提升其理解能力,我们常面临计算资源受限和存储容量有限的制约。本文提出一种称为循环上下文压缩(RCC)的方法,旨在有限存储空间内高效扩展LLMs的上下文窗口长度。我们还研究了在下游任务中同时压缩指令和上下文时模型响应质量下降的问题,并提出一种指令重构方法以缓解此问题。我们在多个任务上验证了方法的有效性:在文本重构任务中实现了高达32倍的压缩率且BLEU4分数接近0.95;在序列长度达100万的密钥检索任务中取得了接近100%的准确率。最终,与非压缩方法相比,我们的方法在长文本问答任务中展现出具有竞争力的性能,同时在长文本推理任务中显著节省了存储资源。相关代码、模型及演示已发布于https://github.com/WUHU-G/RCC_Transformer。