Weighted Ensemble Models Are Strong Continual Learners

In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks. This weighted-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weights ensemble by leveraging the Fisher information of the weights of the model. Both variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks. Code is available at: https://github.com/IemProg/CoFiMA.

翻译：本文研究了持续学习问题，其目标是在一系列任务上训练模型，使得学习当前任务数据时无法获取先前任务的数据。持续学习本质上是在新任务的学习能力（即可塑性）与保持先前所学概念的性能（即稳定性）之间寻求平衡。为解决稳定性-可塑性权衡问题，我们提出对先前任务和当前任务的模型参数进行加权集成。这种加权集成模型被命名为连续模型平均法（CoMA），它通过利用可塑性获得当前任务的高精度，同时避免与先前权重配置偏离过远以确保稳定性。我们还提出了CoMA的改进变体——连续Fisher加权模型平均法（CoFiMA），该方法利用模型权重的Fisher信息对集成中的各参数进行选择性加权。两种变体概念简洁、易于实现，且在多个标准持续学习基准上取得了最优性能。代码开源地址：https://github.com/IemProg/CoFiMA。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日