We propose the In-context Autoencoder (ICAE), leveraging the power of a large language model (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context. Then, it is fine-tuned on instruction data for producing desirable responses to various prompts. Experiments demonstrate that our lightweight ICAE, introducing about 1% additional parameters, effectively achieves $4\times$ context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management. Our data, code and models are available at https://github.com/getao/icae.
翻译:我们提出上下文自编码器(ICAE),利用大语言模型(LLM)的能力将长上下文压缩为紧凑的记忆槽,使LLM能够直接以此为条件实现多种用途。ICAE首先在海量文本数据上通过自编码和语言建模双重目标进行预训练,使其生成的记忆槽能准确且全面地表征原始上下文;随后在指令数据上微调,以针对各类提示生成期望的响应。实验表明,我们的轻量级ICAE(仅引入约1%额外参数)基于Llama实现了4倍上下文压缩,推理时在延迟降低和GPU显存成本上均展现出优势,并在记忆机制与可扩展性方面揭示了有趣见解。这些令人鼓舞的结果暗示了认知科学中的工作记忆与LLM表示学习之间的关联,揭示了ICAE在解决长上下文问题中的重要意义,并为LLM上下文管理研究开辟了新方向。我们的数据、代码和模型已开源至https://github.com/getao/icae。