The objective of continual learning (CL) is to learn tasks sequentially without retraining on earlier tasks. However, when subjected to CL, traditional neural networks exhibit catastrophic forgetting and limited generalization. To overcome these problems, we introduce a novel method called 'Gate and Obstruct Network' (GateON). GateON combines learnable gating of activity and online estimation of parameter relevance to safeguard crucial knowledge from being overwritten. Our method generates partially overlapping pathways between tasks which permits forward and backward transfer during sequential learning. GateON addresses the issue of network saturation after parameter fixation by a re-activation mechanism of fixed neurons, enabling large-scale continual learning. GateON is implemented on a wide range of networks (fully-connected, CNN, Transformers), has low computational complexity, effectively learns up to 100 MNIST learning tasks, and achieves top-tier results for pre-trained BERT in CL-based NLP tasks.
翻译:摘要:持续学习的目标是在不重新训练先前任务的情况下顺序学习新任务。然而,在持续学习场景下,传统神经网络会表现出灾难性遗忘和泛化能力受限的问题。为克服这些挑战,我们提出了一种名为"门控与阻塞网络"(GateON)的新方法。GateON通过结合可学习的活动门控机制和参数相关性的在线估计,保护关键知识不被覆盖。该方法在任务间生成部分重叠的通路,使得顺序学习过程中能够实现前向与后向迁移。针对参数固定后网络饱和的问题,GateON采用固定神经元的再激活机制,从而支持大规模持续学习。GateON可部署于多种网络架构(全连接网络、CNN、Transformer),计算复杂度低,能高效学习多达100个MNIST任务,并在基于预训练BERT的持续学习自然语言处理任务中取得顶尖性能。