Large Language Models (LLMs) need to adapt to the continuous changes in data, tasks, and user preferences. Due to their massive size and the high costs associated with training, LLMs are not suitable for frequent retraining. However, updates are necessary to keep them in sync with rapidly evolving human knowledge. To address these challenges, this paper proposes the Compression Memory Training (CMT) method, an efficient and effective online adaptation framework for LLMs that features robust knowledge retention capabilities. Inspired by human memory mechanisms, CMT compresses and extracts information from new documents to be stored in a memory bank. When answering to queries related to these new documents, the model aggregates these document memories from the memory bank to better answer user questions. The parameters of the LLM itself do not change during training and inference, reducing the risk of catastrophic forgetting. To enhance the encoding, retrieval, and aggregation of memory, we further propose three new general and flexible techniques, including memory-aware objective, self-matching and top-aggregation. Extensive experiments conducted on three continual learning datasets (i.e., StreamingQA, SQuAD and ArchivalQA) demonstrate that the proposed method improves model adaptability and robustness across multiple base LLMs (e.g., +4.07 EM & +4.19 F1 in StreamingQA with Llama-2-7b).
翻译:大语言模型(LLMs)需要适应数据、任务和用户偏好的持续变化。由于其庞大的参数量和训练的高昂成本,LLMs不适合频繁重新训练。然而,为使其与快速演进的人类知识保持同步,模型更新是必要的。为应对这些挑战,本文提出压缩记忆训练(CMT)方法——一种高效且具备强大知识保持能力的大语言模型在线适应框架。受人类记忆机制启发,CMT从新文档中压缩并提取信息存储至记忆库。当回答与这些新文档相关的查询时,模型从记忆库中聚合这些文档记忆以更好地回应用户问题。在训练和推理过程中,大语言模型自身的参数保持不变,从而降低了灾难性遗忘的风险。为增强记忆的编码、检索与聚合能力,我们进一步提出三种通用且灵活的新技术,包括记忆感知目标、自匹配机制和顶部聚合策略。在三个持续学习数据集(即StreamingQA、SQuAD和ArchivalQA)上进行的大量实验表明,所提方法在多个基础大语言模型(例如在Llama-2-7b上实现StreamingQA任务+4.07 EM和+4.19 F1的提升)中均能有效增强模型的适应性与鲁棒性。