Weighted Ensemble Models Are Strong Continual Learners

In this work, we study the problem of continual learning (CL) where the goal is to learn a model on a sequence of tasks, such that the data from the previous tasks becomes unavailable while learning on the current task data. CL is essentially a balancing act between being able to learn on the new task (i.e., plasticity) and maintaining the performance on the previously learned concepts (i.e., stability). Intending to address the stability-plasticity trade-off, we propose to perform weight-ensembling of the model parameters of the previous and current tasks. This weighted-ensembled model, which we call Continual Model Averaging (or CoMA), attains high accuracy on the current task by leveraging plasticity, while not deviating too far from the previous weight configuration, ensuring stability. We also propose an improved variant of CoMA, named Continual Fisher-weighted Model Averaging (or CoFiMA), that selectively weighs each parameter in the weights ensemble by leveraging the Fisher information of the weights of the model. Both variants are conceptually simple, easy to implement, and effective in attaining state-of-the-art performance on several standard CL benchmarks. Code is available at: https://github.com/IemProg/CoFiMA.

翻译：在本工作中，我们研究持续学习问题，其目标是在一系列任务上学习模型，且在学习当前任务数据时无法获取先前任务的数据。持续学习本质上是在学习新任务的能力（即可塑性）与保持已学概念性能（即稳定性）之间寻求平衡。为应对稳定性与可塑性的权衡，我们提出对先前任务与当前任务的模型参数进行加权集成。这种加权集成模型——我们称之为持续模型平均（CoMA）——通过利用可塑性在当前任务上获得高精度，同时不会过度偏离先前的权重配置，从而确保稳定性。我们还提出CoMA的改进变体，即持续费希尔加权模型平均（CoFiMA），该变体通过利用模型权重的费希尔信息，对权重集成中的每个参数进行选择性加权。两种变体均概念简洁、易于实现，并在多个标准持续学习基准测试中取得了最先进的性能。代码发布于：https://github.com/IemProg/CoFiMA。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日