This work proposes a new method to sequentially train deep neural networks on multiple tasks without suffering catastrophic forgetting, while endowing it with the capability to quickly adapt to unseen tasks. Starting from existing work on network masking (Wortsman et al., 2020), we show that simply learning a linear combination of a small number of task-specific supermasks (impressions) on a randomly initialized backbone network is sufficient to both retain accuracy on previously learned tasks, as well as achieve high accuracy on unseen tasks. In contrast to previous methods, we do not require to generate dedicated masks or contexts for each new task, instead leveraging transfer learning to keep per-task parameter overhead small. Our work illustrates the power of linearly combining individual impressions, each of which fares poorly in isolation, to achieve performance comparable to a dedicated mask. Moreover, even repeated impressions from the same task (homogeneous masks), when combined, can approach the performance of heterogeneous combinations if sufficiently many impressions are used. Our approach scales more efficiently than existing methods, often requiring orders of magnitude fewer parameters and can function without modification even when task identity is missing. In addition, in the setting where task labels are not given at inference, our algorithm gives an often favorable alternative to the one-shot procedure used by Wortsman et al., 2020. We evaluate our method on a number of well-known image classification datasets and network architectures.
翻译:本文提出一种新方法,可在多个任务上顺序训练深度神经网络且避免灾难性遗忘,同时赋予其对未见任务的快速适应能力。基于现有关于网络掩码的研究(Wortsman等人,2020),我们证明:在随机初始化的骨干网络上,仅学习少量任务特定超掩码(印象)的线性组合,就足以在保留先前学习任务精度的同时,实现对未见任务的高精度分类。与以往方法不同,我们无需为每个新任务生成专用掩码或上下文,而是利用迁移学习保持各任务的参数量开销极小。本文揭示了线性组合独立表现不佳的个体印象,可实现与专用掩码相媲美的性能。此外,同一任务的重复印象(同质掩码)在足够数量组合时,也能逼近异质组合的性能。我们的方法比现有方法更具扩展性,常需数个量级更少的参数,且可在缺失任务身份标识时无需修改直接运作。在推理阶段未提供任务标签的设置下,本文算法为Wortsman等人(2020)所用的单次推断程序提供了更优的替代方案。我们在多个经典图像分类数据集和网络架构上评估了该方法。