The ability of deep neural networks to continually learn and adapt to a sequence of tasks has remained challenging due to catastrophic forgetting of previously learned tasks. Humans, on the other hand, have a remarkable ability to acquire, assimilate, and transfer knowledge across tasks throughout their lifetime without catastrophic forgetting. The versatility of the brain can be attributed to the rehearsal of abstract experiences through a complementary learning system. However, representation rehearsal in vision transformers lacks diversity, resulting in overfitting and consequently, performance drops significantly compared to raw image rehearsal. Therefore, we propose BiRT, a novel representation rehearsal-based continual learning approach using vision transformers. Specifically, we introduce constructive noises at various stages of the vision transformer and enforce consistency in predictions with respect to an exponential moving average of the working model. Our method provides consistent performance gain over raw image and vanilla representation rehearsal on several challenging CL benchmarks, while being memory efficient and robust to natural and adversarial corruptions.
翻译:摘要:深度神经网络因灾难性遗忘先前学习任务的能力受限,其持续学习与适应任务序列的能力仍面临挑战。相比之下,人类在整个生命周期中能够跨任务获取、整合和迁移知识而不会出现灾难性遗忘,这种大脑的灵活性可归因于通过互补学习系统对抽象经验进行复述。然而,视觉Transformer中的表征复述缺乏多样性,导致过拟合,进而使其性能显著低于原始图像复述。为此,我们提出BiRT——一种基于视觉Transformer的新型表征复述持续学习方法。具体而言,我们在视觉Transformer的多个阶段引入构造性噪声,并强制模型预测与工作模型的指数移动平均保持一致。在多个具有挑战性的持续学习基准上,本方法相较于原始图像复述与标准表征复述均能获得一致的性能提升,同时具备内存高效性以及对自然扰动与对抗性扰动的鲁棒性。