Despite the great success of pre-trained language models, it is still a challenge to use these models for continual learning, especially for the class-incremental learning (CIL) setting due to catastrophic forgetting (CF). This paper reports our finding that if we formulate CIL as a continual label generation problem, CF is drastically reduced and the generalizable representations of pre-trained models can be better retained. We thus propose a new CIL method (VAG) that also leverages the sparsity of vocabulary to focus the generation and creates pseudo-replay samples by using label semantics. Experimental results show that VAG outperforms baselines by a large margin.
翻译:尽管预训练语言模型取得了巨大成功,但在持续学习中使用这些模型仍是一个挑战,尤其是在类增量学习(CIL)场景中,由于灾难性遗忘(CF)的存在。本文报告了我们的发现:如果将CIL形式化为一个持续标签生成问题,CF会显著减少,且预训练模型的可泛化表示能更好地保留。因此,我们提出了一种新的CIL方法(VAG),该方法利用词汇的稀疏性聚焦生成过程,并通过标签语义创建伪重放样本。实验结果表明,VAG的性能大幅优于基线模型。