Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-rank update matrix for each task. To mitigate catastrophic forgetting, state-of-the-art approaches impose constraints on new adapters with respect to the previous ones, by targeting either subspace or coordinate-wise interference. In this paper, we propose JumpLoRA, a novel framework to adaptively induce sparsity in the Low-Rank Adaptation (LoRA) blocks through the use of JumpReLU gating. The method achieves dynamic parameter isolation, which helps prevent task interference. We demonstrate that our method is highly modular and compatible with LoRA-based CL approaches. Specifically, it significantly boosts the performance of IncLoRA and outperforms the leading state-of-the-art CL method, ELLA.
翻译:摘要:基于适配器的方法通过为每个任务顺序学习低秩更新矩阵,已成为大型语言模型(LLMs)实现持续学习(CL)的一种经济高效途径。为缓解灾难性遗忘,现有最优方法通过针对子空间或坐标级干扰施加约束,将新适配器与先前适配器相关联。本文提出JumpLoRA——一种利用JumpReLU门控机制在低秩适配(LoRA)模块中自适应诱导稀疏性的新型框架。该方法实现了动态参数隔离,有助于防止任务干扰。我们证明该方法具有高度模块化特性,且与基于LoRA的持续学习方案兼容。具体而言,它显著提升了IncLoRA的性能,并超越了当前最优持续学习方法ELLA。