Continual Learners are Incremental Model Generalizers

Motivated by the efficiency and rapid convergence of pre-trained models for solving downstream tasks, this paper extensively studies the impact of Continual Learning (CL) models as pre-trainers. In both supervised and unsupervised CL, we find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance. This is because CL models can learn improved task-general features when easily forgetting task-specific knowledge. Based on this observation, we suggest a new unsupervised CL framework with masked modeling, which aims to capture fluent task-generic representation during training. Furthermore, we propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks. The model fine-tuned with GLAD achieves competitive performance and can also be used as a good pre-trained model itself. We believe this paper breaks the barriers between pre-training and fine-tuning steps and leads to a sustainable learning framework in which the continual learner incrementally improves model generalization, yielding better transfer to unseen tasks.

翻译：基于预训练模型在下游任务中高效且快速收敛的特点，本文深入研究了持续学习模型作为预训练器的作用。在监督与无监督持续学习两种场景下，我们发现表征的迁移质量往往逐步提升，而微调性能无明显退化。这是因为持续学习模型在易于遗忘任务特定知识时，能够学习到改进的任务通用特征。基于这一观察，我们提出了一种结合掩码建模的新型无监督持续学习框架，旨在训练过程中捕获流畅的任务通用表征。此外，我们提出了一种新的微调方案——全局注意力离散化，该方案在解决下游任务时能保留丰富的任务通用表征。经GLAD微调后的模型不仅达到了竞争性性能，其自身也可作为优质预训练模型使用。我们相信本文打破了预训练与微调步骤间的壁垒，构建了一种可持续学习框架——持续学习器通过增量式提升模型泛化能力，实现了对未见任务的更优迁移。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日