Incremental Learning (IL) has been a long-standing problem in both vision and Natural Language Processing (NLP) communities. In recent years, as Pre-trained Language Models (PLMs) have achieved remarkable progress in various NLP downstream tasks, utilizing PLMs as backbones has become a common practice in recent research of IL in NLP. Most assume that catastrophic forgetting is the biggest obstacle to achieving superior IL performance and propose various techniques to overcome this issue. However, we find that this assumption is problematic. Specifically, we revisit more than 20 methods on four classification tasks (Text Classification, Intent Classification, Relation Extraction, and Named Entity Recognition) under the two most popular IL settings (Class-Incremental and Task-Incremental) and reveal that most of them severely underestimate the inherent anti-forgetting ability of PLMs. Based on the observation, we propose a frustratingly easy method called SEQ* for IL with PLMs. The results show that SEQ* has competitive or superior performance compared to state-of-the-art (SOTA) IL methods and requires considerably less trainable parameters and training time. These findings urge us to revisit the IL with PLMs and encourage future studies to have a fundamental understanding of the catastrophic forgetting in PLMs. The data, code and scripts are publicly available at https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm.
翻译:增量学习(Incremental Learning, IL)一直是计算机视觉和自然语言处理(NLP)领域的长期难题。近年来,随着预训练语言模型(PLMs)在各种NLP下游任务中取得显著进展,利用PLMs作为骨干网络已成为NLP中IL研究的常见做法。多数研究认为灾难性遗忘是实现优异IL性能的最大障碍,并提出各种技术来克服这一问题。然而,我们发现这一假设存在问题。具体而言,我们在两种最流行的IL设置(类增量与任务增量)下,对四个分类任务(文本分类、意图分类、关系抽取和命名实体识别)中的20多种方法进行了重新审视,揭示出它们大多严重低估了PLMs固有的抗遗忘能力。基于这一观察,我们提出了一种令人惊讶的简单方法SEQ*用于PLMs的IL。结果表明,与最先进的(SOTA)IL方法相比,SEQ*具有竞争性或更优的性能,且需要显著更少的可训练参数和训练时间。这些发现促使我们重新审视基于PLMs的IL,并鼓励未来研究对PLMs中的灾难性遗忘有更基本的理解。数据、代码和脚本已在https://github.com/zzz47zzz/codebase-for-incremental-learning-with-llm 公开提供。