HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. Many continual learning (CL) strategies are trying to overcome this problem. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that, in practice, the hypernetwork can produce completely different architectures for subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a whole network. In the paper, we propose a method called HyperMask, which trains a single network for all CL tasks. The hypernetwork produces semi-binary masks to obtain target subnetworks dedicated to consecutive tasks. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets. Depending on the task, the importance of some weights may be dynamically enhanced while others may be weakened. HyperMask achieves competitive results in several CL datasets and, in some scenarios, goes beyond the state-of-the-art scores, both with derived and unknown task identities.

翻译：人工神经网络在多个任务上顺序训练时会遭受灾难性遗忘。许多持续学习策略试图克服这一问题，其中基于超网络的方法是最有效的策略之一。超网络根据任务标识生成目标模型的权重，但该方法的主要局限在于，实际应用中超网络可能为后续任务生成完全不同的架构。为解决该问题，我们利用彩票假设理论——该理论假设存在能够保持整个网络性能的稀疏子网络（即中奖彩票）。本文提出一种名为HyperMask的方法，通过为所有持续学习任务训练单一网络，由超网络生成半二值掩码以获得专用于连续任务的目标子网络。此外，基于彩票假设，我们能够使用包含加权子网络的单一网络，根据任务需求动态增强某些权重的重要性同时削弱其他权重。HyperMask在多个持续学习数据集上取得了具有竞争力的结果，在部分场景中，无论是已知还是未知任务标识的情况下，均超越了当前最优性能指标。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日