Previous studies on continual knowledge learning (CKL) in large language models (LLMs) have predominantly focused on approaches such as regularization, architectural modifications, and rehearsal techniques to mitigate catastrophic forgetting. However, these methods naively inherit the inefficiencies of standard training procedures, indiscriminately applying uniform weight across all tokens, which can lead to unnecessary parameter updates and increased forgetting. To address these shortcomings, we propose a novel CKL approach termed Train-Attention-Augmented Language Model (TAALM), which enhances learning efficiency by dynamically predicting and applying weights to tokens based on their usefulness. This method employs a meta-learning framework that optimizes token importance predictions, facilitating targeted knowledge updates and minimizing forgetting. Also, we observe that existing benchmarks do not clearly exhibit the trade-off between learning and retaining, therefore we propose a new benchmark, \textsc{LAMA-ckl}, to address this issue. Through experiments conducted on both newly introduced and established CKL benchmarks, TAALM proves the state-of-the-art performance upon the baselines, and also shows synergistic compatibility when integrated with previous CKL approaches.
翻译:先前关于大型语言模型(LLM)中持续知识学习(CKL)的研究主要集中于采用正则化、架构修改和回放技术等方法来缓解灾难性遗忘。然而,这些方法简单地继承了标准训练过程的低效性,不加区分地对所有词元应用统一的权重,这可能导致不必要的参数更新和加剧的遗忘。为解决这些不足,我们提出了一种新颖的CKL方法,称为训练注意力增强语言模型(TAALM),该方法通过动态预测并根据词元的有用性应用权重来提升学习效率。该方法采用一个元学习框架来优化词元重要性预测,从而促进有针对性的知识更新并最小化遗忘。此外,我们观察到现有基准测试未能清晰展现学习与保留之间的权衡,因此我们提出了一个新的基准测试 \textsc{LAMA-ckl} 来解决此问题。通过在引入的新基准和已建立的CKL基准上进行的实验,TAALM证明了其在基线模型上的最先进性能,并且在整合到先前的CKL方法中时也显示出协同兼容性。