HyperMask: Adaptive Hypernetwork-based Masks for Continual Learning

Artificial neural networks suffer from catastrophic forgetting when they are sequentially trained on multiple tasks. To overcome this problem, there exist many continual learning strategies. One of the most effective is the hypernetwork-based approach. The hypernetwork generates the weights of a target model based on the task's identity. The model's main limitation is that hypernetwork can produce completely different nests for each task. Consequently, each task is solved separately. The model does not use information from the network dedicated to previous tasks and practically produces new architectures when it learns the subsequent tasks. To solve such a problem, we use the lottery ticket hypothesis, which postulates the existence of sparse subnetworks, named winning tickets, that preserve the performance of a full network. In the paper, we propose a method called HyperMask, which trains a single network for all tasks. Hypernetwork produces semi-binary masks to obtain target subnetworks dedicated to new tasks. This solution inherits the ability of the hypernetwork to adapt to new tasks with minimal forgetting. Moreover, due to the lottery ticket hypothesis, we can use a single network with weighted subnets dedicated to each task.

翻译：人工神经网络在连续学习多个任务时会出现灾难性遗忘现象。为解决这一问题，学界提出了多种持续学习策略，其中基于超网络的方法最为有效。超网络根据任务标识生成目标模型的权重，但其主要局限在于会为每个任务生成完全不同的网络结构，导致各任务被独立解决。该模型既未利用先前任务网络中的信息，又在学习后续任务时几乎创建了全新架构。为克服该问题，我们引入彩票假说——该理论认为存在保留完整网络性能的稀疏子网络（即中奖票据）。本文提出HyperMask方法，通过为所有任务训练单一网络，由超网络生成半二元掩码，为每个新任务构建目标子网络。该方案继承了超网络适应新任务且几乎不产生遗忘的特性，同时借助彩票假说实现单一网络为每个任务分配加权子网络。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日