Class incremental learning (CIL) is the process of continually learning new object classes from incremental data while not forgetting past learned classes. While the common method for evaluating CIL algorithms is based on average test accuracy for all learned classes, we argue that maximizing accuracy alone does not necessarily lead to effective CIL algorithms. In this paper, we experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning and propose a new analysis method. Our experiments show that most state-of-the-art algorithms prioritize high stability and do not significantly change the learned representation, and sometimes even learn a representation of lower quality than a naive baseline. However, we observe that these algorithms can still achieve high test accuracy because they learn a classifier that is closer to the optimal classifier. We also found that the base model learned in the first task varies in representation quality across different algorithms, and changes in the final performance were observed when each algorithm was trained under similar representation quality of the base model. Thus, we suggest that representation-level evaluation is an additional recipe for more objective evaluation and effective development of CIL algorithms.
翻译:类增量学习是从增量数据中持续学习新对象类别,同时不遗忘已学类别的过程。尽管评估类增量学习算法的常用方法基于所有已学类别的平均测试准确率,但我们认为仅追求最大化准确率并不一定能催生有效的类增量学习算法。本文通过采用表征学习中的多种评估协议,对经类增量学习算法训练的神经网络模型进行实验分析,并提出了一种新的分析方法。实验表明,多数最先进算法优先追求高稳定性,因而不会显著改变已学表征,有时甚至学习到的表征质量低于简单基线方法。然而,我们观察到这些算法仍能取得高测试准确率,原因在于它们学习到了更接近最优分类器的分类器。我们还发现,不同算法在首个任务中学习的基础模型表征质量存在差异,并且在各算法基于相似基础模型表征质量进行训练时,最终性能也会发生变化。因此,我们建议将表征层面的评估作为更客观评价和有效开发类增量学习算法的辅助手段。