Continual Learning (CL) involves adapting the prior Deep Neural Network (DNN) knowledge to new tasks, without forgetting the old ones. However, modern CL techniques focus on provisioning memory capabilities to existing DNN models rather than designing new ones that are able to adapt according to the task at hand. This paper presents the novel Feedback Continual Learning Vision Transformer (FCL-ViT) that uses a feedback mechanism to generate real-time dynamic attention features tailored to the current task. The FCL-ViT operates in two Phases. In phase 1, the generic image features are produced and determine where the Transformer should attend on the current image. In phase 2, task-specific image features are generated that leverage dynamic attention. To this end, Tunable self-Attention Blocks (TABs) and Task Specific Blocks (TSBs) are introduced that operate in both phases and are responsible for tuning the TABs attention, respectively. The FCL-ViT surpasses state-of-the-art performance on Continual Learning compared to benchmark methods, while retaining a small number of trainable DNN parameters.
翻译:持续学习(CL)旨在使深度神经网络(DNN)能够适应新任务,同时不遗忘旧任务。然而,现有的持续学习方法主要侧重于为现有DNN模型提供记忆能力,而非设计能够根据当前任务自适应调整的新模型。本文提出了一种新颖的反馈式持续学习视觉Transformer(FCL-ViT),它利用反馈机制生成针对当前任务的实时动态注意力特征。FCL-ViT 分两个阶段运行:在第一阶段,模型生成通用图像特征,并确定Transformer应在当前图像的哪些区域进行关注;在第二阶段,则生成利用动态注意力的任务特定图像特征。为此,我们引入了可调自注意力块(TABs)和任务特定块(TSBs),这两个模块在两个阶段均发挥作用,分别负责调整TABs的注意力。与基准方法相比,FCL-ViT 在持续学习任务上取得了最先进的性能,同时保持了较少的可训练DNN参数。