Existing research on task incremental learning in continual learning has primarily focused on preventing catastrophic forgetting (CF). Although several techniques have achieved learning with no CF, they attain it by letting each task monopolize a sub-network in a shared network, which seriously limits knowledge transfer (KT) and causes over-consumption of the network capacity, i.e., as more tasks are learned, the performance deteriorates. The goal of this paper is threefold: (1) overcoming CF, (2) encouraging KT, and (3) tackling the capacity problem. A novel technique (called SPG) is proposed that soft-masks (partially blocks) parameter updating in training based on the importance of each parameter to old tasks. Each task still uses the full network, i.e., no monopoly of any part of the network by any task, which enables maximum KT and reduction in capacity usage. To our knowledge, this is the first work that soft-masks a model at the parameter-level for continual learning. Extensive experiments demonstrate the effectiveness of SPG in achieving all three objectives. More notably, it attains significant transfer of knowledge not only among similar tasks (with shared knowledge) but also among dissimilar tasks (with little shared knowledge) while mitigating CF.
翻译:现有持续学习中的任务增量学习研究主要聚焦于灾难性遗忘(CF)的防止。尽管已有多种技术实现了无遗忘学习,但它们通过让每个任务独占共享网络中的子网络来实现,严重限制了知识迁移(KT)并导致网络容量的过度消耗,即随着学习任务增多,性能逐渐下降。本文目标有三:(1)克服CF,(2)促进KT,(3)解决容量问题。我们提出一种新颖的SPG技术,在训练过程中基于每个参数对旧任务的重要性,对参数更新进行软屏蔽(部分阻挡)。每个任务仍使用完整网络,即任何任务都不垄断网络的任何部分,从而最大化KT并减少容量占用。据我们所知,这是首个在参数级别对模型进行软屏蔽的持续学习方法。大量实验证明SPG能有效实现全部三个目标。尤为值得关注的是,该方法不仅在相似任务间(具有共享知识)实现显著知识迁移,还能在不相似任务间(知识共享极少)达成有效迁移,同时缓解灾难性遗忘。