Weighted Ensemble Models Are Strong Continual Learners

In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). With an aim to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current task. This weight-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weight ensemble by leveraging the Fisher information of the weights of the model. Both the variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks.

翻译：在本工作中，我们研究了持续学习（CL）问题，其目标是在一系列任务上学习模型，使得在利用当前任务数据学习时，先前任务的数据变得不可用。CL本质上是在新任务的学习能力（即可塑性）与保持先前所学概念的性能（即稳定性）之间取得平衡。为解决稳定性-可塑性权衡问题，我们提出对先前任务和当前任务的模型参数进行权重集成。这种权重集成模型——称为持续模型平均（CoMA）——通过利用可塑性在当前任务上获得高精度，同时不偏离先前权重配置过远，从而确保稳定性。我们还提出了CoMA的改进变体，名为持续费舍尔加权模型平均（CoFiMA），该变体通过利用模型权重的费舍尔信息，在权重集成中为每个参数选择性加权。这两种变体概念简单、易于实现，并在多个标准持续学习基准上达到了最先进的性能。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日