State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting. However, there is a tradeoff between the number of learned parameters and the performance, making such models computationally expensive. In this work, we aim to reduce this cost while maintaining competitive performance. We achieve this by revisiting and extending a simple transfer learning idea: learning task-specific normalization layers. Specifically, we tune the scale and bias parameters of LayerNorm for each continual learning task, selecting them at inference time based on the similarity between task-specific keys and the output of the pre-trained model. To make the classifier robust to incorrect selection of parameters during inference, we introduce a two-stage training procedure, where we first optimize the task-specific parameters and then train the classifier with the same selection procedure of the inference time. Experiments on ImageNet-R and CIFAR-100 show that our method achieves results that are either superior or on par with {the state of the art} while being computationally cheaper.
翻译:当前最先进的无重放持续学习方法利用视觉Transformer的特性学习任务特定提示,显著减少了灾难性遗忘。然而,这些方法在学习参数数量与性能之间存在权衡,导致计算成本高昂。本研究旨在降低这一成本的同时保持竞争性性能。我们通过重新审视并扩展一个简单的迁移学习思路来实现这一目标:学习任务特定的归一化层。具体而言,我们针对每个持续学习任务调整层归一化的缩放和偏置参数,并在推理时基于任务特定键与预训练模型输出的相似性进行参数选择。为增强分类器对推理时参数错误选择的鲁棒性,我们引入两阶段训练流程:首先优化任务特定参数,然后采用与推理时相同的选择机制训练分类器。在ImageNet-R和CIFAR-100上的实验表明,我们的方法在计算成本更低的情况下取得了优于或持平当前最先进水平的结果。