Continual Learning (CL) involves adapting the prior Deep Neural Network (DNN) knowledge to new tasks, without forgetting the old ones. However, modern CL techniques focus on provisioning memory capabilities to existing DNN models rather than designing new ones that are able to adapt according to the task at hand. This paper presents the novel Feedback Continual Learning Vision Transformer (FCL-ViT) that uses a feedback mechanism to generate real-time dynamic attention features tailored to the current task. The FCL-ViT operates in two Phases. In phase 1, the generic image features are produced and determine where the Transformer should attend on the current image. In phase 2, task-specific image features are generated that leverage dynamic attention. To this end, Tunable self-Attention Blocks (TABs) and Task Specific Blocks (TSBs) are introduced that operate in both phases and are responsible for tuning the TABs attention, respectively. The FCL-ViT surpasses state-of-the-art performance on Continual Learning compared to benchmark methods, while retaining a small number of trainable DNN parameters.
翻译:持续学习(Continual Learning, CL)旨在使深度神经网络(Deep Neural Network, DNN)能够适应新任务,同时不遗忘旧任务。然而,现有的持续学习方法主要侧重于为已有DNN模型提供记忆能力,而非设计能够根据当前任务进行自适应调整的新型模型。本文提出了一种新颖的反馈式持续学习视觉Transformer(Feedback Continual Learning Vision Transformer, FCL-ViT),它利用反馈机制生成针对当前任务的实时动态注意力特征。FCL-ViT 在两个阶段中运行。在第一阶段,模型生成通用图像特征,并确定Transformer应在当前图像的哪些区域进行关注。在第二阶段,模型利用动态注意力生成任务特定的图像特征。为此,本文引入了可调自注意力模块(Tunable self-Attention Blocks, TABs)和任务特定模块(Task Specific Blocks, TSBs),这两个模块在两个阶段中协同工作,分别负责对TABs的注意力进行调优。与基准方法相比,FCL-ViT 在持续学习任务上取得了最先进的性能,同时保持了较少的可训练DNN参数量。