Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. Many continual learning (CL) strategies are trying to overcome this problem. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that, in practice, the hypernetwork can produce completely different architectures for subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a whole network. In the paper, we propose a method called HyperMask, which trains a single network for all CL tasks. The hypernetwork produces semi-binary masks to obtain target subnetworks dedicated to consecutive tasks. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets. Depending on the task, the importance of some weights may be dynamically enhanced while others may be weakened. HyperMask achieves competitive results in several CL datasets and, in some scenarios, goes beyond the state-of-the-art scores, both with derived and unknown task identities.
翻译:人工神经网络在多个任务上顺序训练时会遭受灾难性遗忘。许多持续学习策略试图克服这一问题,其中基于超网络的方法是最有效的策略之一。超网络根据任务标识生成目标模型的权重,但该方法的主要局限在于,实际应用中超网络可能为后续任务生成完全不同的架构。为解决该问题,我们利用彩票假设理论——该理论假设存在能够保持整个网络性能的稀疏子网络(即中奖彩票)。本文提出一种名为HyperMask的方法,通过为所有持续学习任务训练单一网络,由超网络生成半二值掩码以获得专用于连续任务的目标子网络。此外,基于彩票假设,我们能够使用包含加权子网络的单一网络,根据任务需求动态增强某些权重的重要性同时削弱其他权重。HyperMask在多个持续学习数据集上取得了具有竞争力的结果,在部分场景中,无论是已知还是未知任务标识的情况下,均超越了当前最优性能指标。