Continual learning (CL) in vision-language models (VLMs) faces significant challenges in improving task adaptation and avoiding catastrophic forgetting. Existing methods usually have heavy inference burden or rely on external knowledge, while Low-Rank Adaptation (LoRA) has shown potential in reducing these issues by enabling parameter-efficient tuning. However, considering directly using LoRA to alleviate the catastrophic forgetting problem is non-trivial, we introduce a novel framework that restructures a single LoRA module as a decomposable Rank-1 Expert Pool. Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [CLS] token. In addition, we propose an Activation-Guided Orthogonal (AGO) loss that orthogonalizes critical parts of LoRA weights across tasks. This sparse composition and orthogonalization enable fewer parameter updates, resulting in domain-aware learning while minimizing inter-task interference and maintaining downstream task performance. Extensive experiments across multiple settings demonstrate state-of-the-art results in all metrics, surpassing zero-shot upper bounds in generalization. Notably, it reduces trainable parameters by 96.7% compared to the baseline method, eliminating reliance on external datasets or task-ID discriminators. The merged LoRAs retain less weights and incur no inference latency, making our method computationally lightweight.
翻译:视觉语言模型中的持续学习在提升任务适应能力和避免灾难性遗忘方面面临重大挑战。现有方法通常具有沉重的推理负担或依赖外部知识,而低秩适应通过实现参数高效调优显示出缓解这些问题的潜力。然而,考虑到直接使用LoRA来缓解灾难性遗忘问题并非易事,我们引入了一种新颖框架,将单个LoRA模块重构为可分解的秩-1专家池。我们的方法通过学习在[CLS]令牌语义的引导下,从该专家池中动态选择并组合稀疏的、任务特定的更新。此外,我们提出了一种激活引导正交损失,用于正交化跨任务中LoRA权重的关键部分。这种稀疏组合与正交化实现了更少的参数更新,从而在最小化任务间干扰并保持下游任务性能的同时,实现领域感知学习。在多种设置下的大量实验表明,该方法在所有指标上均取得了最先进的结果,在泛化能力上超越了零样本上界。值得注意的是,与基线方法相比,它将可训练参数减少了96.7%,且无需依赖外部数据集或任务ID判别器。合并后的LoRA保留了更少的权重且不引入推理延迟,使得我们的方法在计算上非常轻量。