Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. To address the crucial need to keep models updated, online learning has emerged as a critical tool when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention. We propose a feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank. When answering questions, our model attends to and extracts relevant knowledge from this memory bank. To learn informative modulations in an efficient manner, we utilize amortization-based meta-learning, which substitutes an otherwise required optimization process with a single forward pass of the encoder. Subsequently, we learn to choose from and aggregate selected documents into a single modulation by conditioning on the question, allowing us to adapt a frozen language model during test time without requiring further gradient updates. Our experiment demonstrates the superiority of MAC in multiple aspects, including online adaptation performance, time, and memory efficiency. In addition, we show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations (RAGs). Code is available at: https://github.com/jihoontack/MAC.
翻译:由于信息的快速生成与传播,尽管付出了巨大的开发成本,大型语言模型(LLMs)仍会迅速过时。为满足保持模型更新的关键需求,在现实应用中部署LLMs时,在线学习已成为一项重要工具。然而,面对不断增长的未见文档语料库以及现代LLMs庞大的参数空间,高效自适应至关重要。为应对这些挑战,我们提出了摊销上下文记忆(MAC),这是一个面向LLMs的高效且具备强大知识保持能力的在线自适应框架。我们提出了一种特征提取与记忆增强方法,用于从新文档中压缩并提取信息,将其转化为存储在记忆库中的紧凑调制向量。在回答问题时,我们的模型能够关注并从此记忆库中提取相关知识。为高效学习信息丰富的调制向量,我们采用基于摊销的元学习方法,用编码器的单次前向传播替代原本所需的优化过程。随后,我们通过学习根据问题条件从选定文档中进行选择并将其聚合为单一调制向量,从而实现在测试阶段无需梯度更新即可自适应冻结的语言模型。实验结果表明,MAC在在线自适应性能、时间与内存效率等多个方面均具有优越性。此外,我们还展示了MAC如何与检索增强生成(RAG)等流行方法结合并提升其性能。代码发布于:https://github.com/jihoontack/MAC。