The "pre-training $\rightarrow$ downstream adaptation" presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved through Parameter-Efficient-Tuning (PET) adaptation paradigm, only prompt has been explored, limiting its application to Transformers only. In this paper, we position prompting as one instantiation of PET, and propose a unified CL framework with general PET, dubbed as Learning-Accumulation-Ensemble (LAE). PET, e.g., using Adapter, LoRA, or Prefix, can adapt a pre-trained model to downstream tasks with fewer parameters and resources. Given a PET method, our LAE framework incorporates it for CL with three novel designs. 1) Learning: the pre-trained model adapts to the new task by tuning an online PET module, along with our adaptation speed calibration to align different PET modules, 2) Accumulation: the task-specific knowledge learned by the online PET module is accumulated into an offline PET module through momentum update, 3) Ensemble: During inference, we respectively construct two experts with online/offline PET modules (which are favored by the novel/historical tasks) for prediction ensemble. We show that LAE is compatible with a battery of PET methods and gains strong CL capability. For example, LAE with Adaptor PET surpasses the prior state-of-the-art by 1.3% and 3.6% in last-incremental accuracy on CIFAR100 and ImageNet-R datasets, respectively.
翻译:“预训练→下游适应”范式为持续学习(CL)带来了新的机遇与挑战。尽管当前最先进的持续学习通过参数高效微调(PET)适应范式实现,但仅探索了提示(prompt)方法,使其应用局限于Transformer架构。本文将提示视为PET的一种具体实例,提出基于通用PET的统一持续学习框架——学习-积累-集成(LAE)。PET方法(如Adapter、LoRA或Prefix)能够以更少的参数和资源将预训练模型适应至下游任务。针对给定的PET方法,LAE框架通过三项创新设计实现持续学习:1)学习:通过调优在线PET模块使预训练模型适应新任务,并引入适应速度校准机制以对齐不同PET模块;2)积累:通过动量更新将在线PET模块习得的任务特定知识累积至离线PET模块;3)集成:推理阶段分别构建基于在线/离线PET模块的两个专家(分别侧重处理新任务与历史任务)进行预测集成。实验表明,LAE兼容多种PET方法并具备强大的持续学习能力。例如,采用Adapter PET的LAE在CIFAR100和ImageNet-R数据集上的末次增量准确率分别超越此前最先进方法1.3%和3.6%。