Artificial neural networks often suffer from catastrophic forgetting, where learning new concepts leads to a complete loss of previously acquired knowledge. We observe that this issue is particularly magnified in vision transformers (ViTs), where post-pre-training and fine-tuning on new tasks can significantly degrade the model's original general abilities. For instance, a DINO ViT-Base/16 pre-trained on ImageNet-1k loses over 70% accuracy on ImageNet-1k after just 10 iterations of fine-tuning on CIFAR-100. Overcoming this stability-plasticity dilemma is crucial for enabling ViTs to continuously learn and adapt to new domains while preserving their initial knowledge. In this work, we study two new parameter-efficient fine-tuning strategies: (1)~Block Expansion, and (2) Low-rank adaptation (LoRA). Our experiments reveal that using either Block Expansion or LoRA on self-supervised pre-trained ViTs surpass fully fine-tuned ViTs in new domains while offering significantly greater parameter efficiency. Notably, we find that Block Expansion experiences only a minimal performance drop in the pre-training domain, thereby effectively mitigating catastrophic forgetting in pre-trained ViTs.
翻译:人工神经网络常受灾难性遗忘困扰,即学习新概念会导致先前获得的知识完全丧失。我们观察到该问题在视觉Transformer(ViT)中尤为显著:当ViT完成预训练并在新任务上微调后,其原有通用能力会大幅退化。例如,在ImageNet-1k上预训练的DINO ViT-Base/16模型,仅用CIFAR-100数据集微调10轮后,其在ImageNet-1k上的准确率便下降超过70%。克服这种稳定性-可塑性困境对于ViT持续学习与适应新领域、同时保持初始知识至关重要。本研究探索了两种新型参数高效微调策略:(1)块扩展(Block Expansion),(2)低秩适配(LoRA)。实验表明,在自监督预训练ViT上应用Block Expansion或LoRA,不仅在新领域上全面超越全参数微调的ViT,还实现了显著更高的参数效率。值得注意的是,我们发现Block Expansion在预训练领域的性能下降极小,从而有效缓解了预训练ViT的灾难性遗忘问题。