KlF: Knowledge Localization and Fusion for Language Model Continual Learning

from arxiv, This version updates the model name from Task Skill Localization and Consolidation (TaSL) to Knowledge Localization and Fusion (KlF). It is an extension of the ACL 2024 paper titled Continual Dialog State Tracking via Task Skill Localization and Consolidation

Language model continual learning (CL) has recently attracted significant interest for its ability to adapt large language models (LLMs) to dynamic real-world scenarios without retraining. A major challenge in this domain is catastrophic forgetting, where models lose previously acquired knowledge upon learning new tasks. Existing approaches commonly utilize multiple parameter-efficient fine-tuning (PEFT) blocks to acquire task-specific knowledge, yet these methods are inefficient and fail to leverage potential knowledge transfer across tasks. In this paper, we introduce a novel CL framework for language models, named Knowledge Localization and Fusion (KlF), which boosts knowledge transfer without depending on memory replay. KlF initially segregates the model into 'skill units' based on parameter dependencies, allowing for more precise control. Subsequently, it employs a novel group-wise knowledge localization technique to ascertain the importance distribution of skill units for a new task. By comparing this importance distribution with those from previous tasks, we implement a fine-grained knowledge fusion strategy that retains task-specific knowledge, thereby preventing forgetting, and updates task-shared knowledge, which facilitates bi-directional knowledge transfer. As a result, KlF achieves an optimal balance between retaining prior knowledge and excelling in new tasks. KlF also demonstrates strong generalizability, making it suitable for various base models and adaptable to PEFT methods like LoRA. Furthermore, it offers notable extensibility, supporting enhancements through integration with memory replay techniques. Comprehensive experiments conducted on two CL benchmarks, involving models ranging from 220M to 7B parameters, affirm the effectiveness of KlF and its variants across different settings.

翻译：语言模型持续学习（CL）近期因其能够使大型语言模型（LLM）适应动态现实场景而无需重新训练，引起了广泛关注。该领域的一个主要挑战是灾难性遗忘，即模型在学习新任务时会丢失先前获得的知识。现有方法通常利用多个参数高效微调（PEFT）模块来获取任务特定知识，但这些方法效率低下，且未能充分利用任务间潜在的知识迁移。本文提出了一种新颖的语言模型持续学习框架，称为知识定位与融合（KlF），该框架在不依赖记忆重放的情况下增强了知识迁移能力。KlF首先根据参数依赖关系将模型划分为“技能单元”，从而实现更精细的控制。随后，它采用一种新颖的分组知识定位技术来确定新任务对技能单元的重要性分布。通过将此重要性分布与先前任务的重要性分布进行比较，我们实施了一种细粒度的知识融合策略：该策略保留任务特定知识以防止遗忘，并更新任务共享知识以促进双向知识迁移。因此，KlF在保留先验知识与胜任新任务之间实现了最优平衡。KlF还展现出强大的泛化能力，适用于多种基础模型，并能适配如LoRA等PEFT方法。此外，它具有显著的可扩展性，支持通过与记忆重放技术集成进行增强。在两个CL基准测试上进行的全面实验，涵盖了参数量从2.2亿到70亿的模型，证实了KlF及其变体在不同设置下的有效性。