Towards Robust and Efficient Continual Language Learning

As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing more harm than good, i.e., negative transfer. In this paper, we construct a new benchmark of task sequences that target different possible transfer scenarios one might face, such as a sequence of tasks with high potential of positive transfer, high potential for negative transfer, no expected effect, or a mixture of each. An ideal learner should be able to maximally exploit information from all tasks that have any potential for positive transfer, while also avoiding the negative effects of any distracting tasks that may confuse it. We then propose a simple, yet effective, learner that satisfies many of our desiderata simply by leveraging a selective strategy for initializing new models from past task checkpoints. Still, limitations remain, and we hope this benchmark can help the community to further build and analyze such learners.

翻译：随着语言模型应用领域的不断拓展，一个自然的问题是：如何使模型快速适应新任务？我们从持续学习的视角探讨这一经典问题，旨在通过在新任务上持续微调基于旧任务训练的模型，实现相关知识的“迁移”。然而，这种策略也可能弊大于利，即产生负迁移。本文构建了一个新的任务序列基准，针对可能面临的不同迁移场景，包括具有高正迁移潜力、高负迁移潜力、无预期影响或混合效应的任务序列。理想的学习器应能最大化利用所有具备正迁移潜力的任务信息，同时避免干扰性任务带来的负面影响。我们进而提出一种简单而有效的学习器，其通过从先前任务检查点中选择性初始化新模型这一策略，即可满足上述多数需求。尽管如此，该方法仍存在局限性，我们期待该基准能助力学界进一步构建与分析此类学习器。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日