An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning

Class-Incremental Learning (CIL) aims to build classification models from data streams. At each step of the CIL process, new classes must be integrated into the model. Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored, the case on which we focus here. To date, most approaches are based exclusively on the target dataset of the CIL process. However, the use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum. The initial model of the CIL process may only use the first batch of the target dataset, or also use pre-trained weights obtained on an auxiliary dataset. The choice between these two initial learning strategies can significantly influence the performance of the incremental learning model, but has not yet been studied in depth. Performance is also influenced by the choice of the CIL algorithm, the neural architecture, the nature of the target task, the distribution of classes in the stream and the number of examples available for learning. We conduct a comprehensive experimental study to assess the roles of these factors. We present a statistical analysis framework that quantifies the relative contribution of each factor to incremental performance. Our main finding is that the initial training strategy is the dominant factor influencing the average incremental accuracy, but that the choice of CIL algorithm is more important in preventing forgetting. Based on this analysis, we propose practical recommendations for choosing the right initial training strategy for a given incremental learning use case. These recommendations are intended to facilitate the practical deployment of incremental learning.

翻译：类别增量学习（CIL）旨在从数据流中构建分类模型。在CIL过程的每一步中，新类别必须被整合到模型中。由于灾难性遗忘，当无法存储过去类别的样本时（本文聚焦于此情形），CIL尤为具有挑战性。迄今为止，大多数方法仅基于CIL过程的目标数据集。然而，近年来，使用在大规模数据上通过自监督方式预训练的模型逐渐兴起。CIL过程的初始模型可能仅使用目标数据集的第一个批次，也可能同时使用在辅助数据集上获得的预训练权重。这两种初始学习策略的选择会显著影响增量学习模型的性能，但尚未得到深入研究。此外，性能还受CIL算法、神经架构、目标任务性质、数据流中类别的分布以及可用于学习的样本数量等因素的影响。我们开展了一项全面的实验研究，以评估这些因素的作用。我们提出了一种统计分析框架，用于量化每个因素对增量性能的相对贡献。我们的主要发现是：初始训练策略是影响平均增量精度的主导因素，但在防止遗忘方面，CIL算法的选择更为关键。基于此分析，我们为特定增量学习用例选择恰当的初始训练策略提出了实用建议。这些建议旨在促进增量学习的实际部署。