We propose the In-context Autoencoder (ICAE), leveraging the power of a large language models (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context; Then, it is fine-tuned on instruction data for producing desirable responses to various prompts. Experiments demonstrate that our lightweight ICAE, introducing fewer than 1% additional parameters, effectively achieves 4X context compression based on Llama, offering advantages in both improved latency and GPU memory cost during inference, and showing an interesting insight in memorization as well as potential for scalability. These promising results imply a novel perspective on the connection between working memory in cognitive science and representation learning in LLMs, revealing ICAE's significant implications in addressing the long context problem and suggesting further research in LLM context management. Our data, code and model are released at https://github.com/getao/icae.
翻译:我们提出了上下文自编码器(ICAE),利用大型语言模型(LLM)的强大能力,将长上下文压缩为紧凑的记忆槽,LLM可直接将其作为条件用于各种目的。ICAE首先在大量文本数据上通过自编码和语言建模目标进行预训练,使其能够生成准确且全面表示原始上下文的记忆槽;随后,它在指令数据上进行微调,以针对各种提示生成理想响应。实验表明,我们轻量级的ICAE(仅引入少于1%的额外参数)能基于Llama有效实现4倍上下文压缩,在推理过程中在延迟和GPU内存成本方面展现出优势,并在记忆机制及可扩展性潜力方面提供了有趣的见解。这些令人鼓舞的结果揭示了认知科学中的工作记忆与LLM表示学习之间的新颖联系,展现了ICAE在解决长上下文问题中的重要意义,并为LLM上下文管理研究提供了新方向。我们的数据、代码和模型已发布于 https://github.com/getao/icae。