Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning.
翻译:增量学习(IL)一直是视觉和自然语言处理(NLP)领域中长期存在的问题。近年来,随着预训练语言模型(PLMs)在各种NLP下游任务中取得显著进展,利用PLMs作为主干网络已成为NLP增量学习研究中的常见做法。多数研究假设灾难性遗忘是实现优异增量学习性能的最大障碍,并提出了各种技术来克服这一问题。然而,我们发现这一假设存在问题。具体而言,我们在两种最流行的增量学习设置(类增量与任务增量)下,针对四个分类任务(文本分类、意图分类、关系抽取和命名实体识别),重新审视了超过20种方法,并揭示出大多数方法严重低估了PLMs固有的抗遗忘能力。基于这一观察,我们提出了一种极其简单的方法SEQ*用于PLMs的增量学习。结果表明,与最先进的(SOTA)增量学习方法相比,SEQ*具有相当或更优的性能,且所需可训练参数和训练时间显著减少。这些发现促使我们重新审视基于PLMs的增量学习,并鼓励未来研究对PLMs中的灾难性遗忘问题有根本性理解。数据、代码和脚本已在https://github.com/zzz47zzz/pretrained-lm-for-incremental-learning公开提供。