Continual learning (CL) aims to constantly learn new knowledge over time while avoiding catastrophic forgetting on old tasks. In this work, we focus on continual text classification under the class-incremental setting. Recent CL studies find that the representations learned in one task may not be effective for other tasks, namely representation bias problem. For the first time we formally analyze representation bias from an information bottleneck perspective and suggest that exploiting representations with more class-relevant information could alleviate the bias. To this end, we propose a novel replay-based continual text classification method, RepCL. Our approach utilizes contrastive and generative representation learning objectives to capture more class-relevant features. In addition, RepCL introduces an adversarial replay strategy to alleviate the overfitting problem of replay. Experiments demonstrate that RepCL effectively alleviates forgetting and achieves state-of-the-art performance on three text classification tasks.
翻译:摘要:持续学习(CL)旨在随时间不断学习新知识,同时避免对旧任务的灾难性遗忘。本文聚焦于类别增量场景下的持续文本分类任务。近期持续学习研究发现,某个任务上学到的表征可能对其他任务无效,即表征偏差问题。本研究首次从信息瓶颈视角对表征偏差进行形式化分析,并提出利用包含更多类别相关信息的表征能够缓解该偏差。为此,我们提出一种新颖的基于回放的持续文本分类方法RepCL。该方法通过对比学习与生成式表征学习目标来捕获更多与类别相关的特征。此外,RepCL引入对抗性回放策略以缓解回放过程的过拟合问题。实验表明,RepCL能有效缓解遗忘问题,并在三个文本分类任务上取得最先进性能。