We revisit continual learning~(CL), which enables pre-trained vision transformers (ViTs) to sequentially fine-tune on new downstream tasks over time. However, as the scale of these models increases, catastrophic forgetting remains a more serious challenge. Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning (PEFT), which focuses on fine-tuning only a small set of trainable parameters to adapt to downstream tasks, such as low-rank adaptation (LoRA). While LoRA achieves faster convergence and requires fewer trainable parameters, it has seldom been explored in the context of continual learning. To address this gap, we propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA), which introduces both an orthogonal LoRA adapter and a residual LoRA adapter parallel to pre-trained weights in each layer. These components are orchestrated by a dynamic memory mechanism to strike a balance between stability and plasticity. Additionally, we propose a scheme to predict task identity with confidence and calibrate the model's outputs accordingly. On ViT-based models, we demonstrate that DualLoRA offers significant advantages in accuracy, inference speed, and computation efficiency in training over existing CL methods across multiple benchmarks.
翻译:我们重新审视持续学习(continual learning, CL)问题,其目标在于使预训练的视觉Transformer(ViTs)能够在新下游任务上随时间顺序进行微调。然而,随着这些模型规模的增大,灾难性遗忘问题变得更为严峻。近期研究揭示了持续学习技术与参数高效微调(PEFT)之间的交叉关联,后者专注于仅微调少量可训练参数以适应下游任务,例如低秩适配(LoRA)。尽管LoRA能实现更快的收敛速度并需要更少的可训练参数,但它在持续学习场景中的应用仍鲜有探索。为填补这一空白,我们提出一种名为双低秩适配(DualLoRA)的新型参数高效微调持续学习方法,该方法在每个层中并行引入正交LoRA适配器和残差LoRA适配器至预训练权重。这些组件通过动态记忆机制协调运作,以在稳定性和可塑性之间取得平衡。此外,我们提出一种方案,可置信地预测任务身份并相应校准模型输出。基于ViT模型的实验表明,在多个基准测试中,DualLoRA在准确率、推理速度和训练计算效率方面相较于现有持续学习方法具有显著优势。