HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. Many continual learning (CL) strategies are trying to overcome this problem. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that, in practice, the hypernetwork can produce completely different architectures for subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a whole network. In the paper, we propose a method called HyperMask, which dynamically filters a target network depending on the CL task. The hypernetwork produces semi-binary masks to obtain dedicated target subnetworks. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets. Depending on the task, the importance of some weights may be dynamically enhanced while others may be weakened. HyperMask achieves competitive results in several CL datasets and, in some scenarios, goes beyond the state-of-the-art scores, both with derived and unknown task identities.

翻译：人工神经网络在顺序训练多个任务时会遭受灾难性遗忘。许多持续学习策略试图克服这一问题。其中基于超网络的方法最为有效之一。超网络根据任务身份生成目标模型的权重。该模型的主要局限在于，在实践中，超网络可能为后续任务生成完全不同的架构。为解决此问题，我们利用彩票假设，该假设认为存在被称为中奖彩票的稀疏子网络，能够保持整个网络的性能。本文提出一种名为HyperMask的方法，该方法根据持续学习任务动态过滤目标网络。超网络生成半二进制掩码以获得专用的目标子网络。此外，基于彩票假设，我们可以使用具有加权子网络的单一网络。根据任务不同，某些权重的重要性可能被动态增强，而其他权重可能被削弱。HyperMask在多个持续学习数据集上取得了具有竞争力的结果，在某些场景下甚至超越了最先进水平，无论是对于已知任务身份还是未知任务身份的情况。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日