Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, there exist many continual learning strategies. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that hypernetwork can produce completely different nests for each task. Consequently, each task is solved separately. The model does not use information from the network dedicated to previous tasks and practically produces new architectures when it learns the subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a full network. In the paper, we propose a method called HyperMask, which trains a single network for all tasks. Hypernetwork produces semi-binary masks to obtain target subnetworks dedicated to new tasks. This solution inherits the ability of the hypernetwork to adapt to new tasks with minimal forgetting. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets dedicated to each task.
翻译:人工神经网络在连续学习多个任务时会出现灾难性遗忘现象。为解决这一问题,学界提出了多种持续学习策略,其中基于超网络的方法最为有效。超网络根据任务标识生成目标模型的权重,但其主要局限在于会为每个任务生成完全不同的网络结构,导致各任务被独立解决。该模型既未利用先前任务网络中的信息,又在学习后续任务时几乎创建了全新架构。为克服该问题,我们引入彩票假说——该理论认为存在保留完整网络性能的稀疏子网络(即中奖票据)。本文提出HyperMask方法,通过为所有任务训练单一网络,由超网络生成半二元掩码,为每个新任务构建目标子网络。该方案继承了超网络适应新任务且几乎不产生遗忘的特性,同时借助彩票假说实现单一网络为每个任务分配加权子网络。