Continual learning requires models to integrate new classes or domains over time while preserving previously acquired knowledge. Within this paradigm, foundation models often achieve strong performance, but they still remain subject to the stability-plasticity trade-off, where excessive plasticity leads to forgetting of prior knowledge, and excessive stability constrains the adaptation. This necessitates an effective post-training strategy that introduces minimal yet functional modifications. To address this challenge, we first introduce a new parameter-efficient fine-tuning module 'Learn and Calibrate', or LuCA, designed to acquire task-specific knowledge through an adapter-calibrator couple, enabling well-refined feature representations. Then, for each task, we deploy a sparse LuCA module on top of the last classification token [CLS] just before the classifier, which we refer to as 'Token-level Sparse Calibration and Adaptation', or TOSCA. By leaving the generalization capabilities of the foundation models intact and adapting exclusively via the last token, our approach achieves a harmonious balance between stability and plasticity while reducing both training and inference complexity. We demonstrate that TOSCA yields state-of-the-art performance while introducing ~8 times fewer parameters compared to prior methods.
翻译:持续学习要求模型能够随时间整合新类别或新领域,同时保持先前获得的知识。在此范式中,基础模型通常表现出色,但仍受制于稳定性-可塑性权衡:过度的可塑性会导致先前知识的遗忘,而过度的稳定性则会限制适应能力。这需要一种有效的后训练策略,以引入最小但功能性的修改。为应对这一挑战,我们首先提出一种新的参数高效微调模块"学习与校准"(Learn and Calibrate,简称LuCA),该模块通过适配器-校准器对获取任务特定知识,实现精细化的特征表示。随后,针对每个任务,我们在分类器之前的最后一个分类标记[CLS]上部署稀疏LuCA模块,称之为"标记级稀疏校准与适应"(Token-level Sparse Calibration and Adaptation,简称TOSCA)。通过保持基础模型的泛化能力完好无损,并仅通过最后一个标记进行适应,我们的方法在稳定性与可塑性之间实现了和谐平衡,同时降低了训练和推理复杂度。实验表明,TOSCA在仅引入约8倍少于现有方法参数量的情况下,取得了最先进的性能。