We propose and study a realistic Continual Learning (CL) setting where learning algorithms are granted a restricted computational budget per time step while training. We apply this setting to large-scale semi-supervised Continual Learning scenarios with sparse label rates. Previous proficient CL methods perform very poorly in this challenging setting. Overfitting to the sparse labeled data and insufficient computational budget are the two main culprits for such a poor performance. Our new setting encourages learning methods to effectively and efficiently utilize the unlabeled data during training. To that end, we propose a simple but highly effective baseline, DietCL, which utilizes both unlabeled and labeled data jointly. DietCL meticulously allocates computational budget for both types of data. We validate our baseline, at scale, on several datasets, e.g., CLOC, ImageNet10K, and CGLM, under constraint budget setups. DietCL outperforms, by a large margin, all existing supervised CL algorithms as well as more recent continual semi-supervised methods. Our extensive analysis and ablations demonstrate that DietCL is stable under a full spectrum of label sparsity, computational budget, and various other ablations.
翻译:我们提出并研究了一种现实的持续学习(CL)设定,其中学习算法在训练过程中每个时间步被分配有限的计算预算。我们将此设定应用于具有稀疏标注率的大规模半监督持续学习场景。先前的高效CL方法在此具有挑战性的设定下表现极差。对稀疏标注数据的过拟合以及计算预算不足是导致这种低性能的两个主要原因。我们的新设定鼓励学习方法在训练期间有效且高效地利用未标注数据。为此,我们提出了一种简单但高效的基线方法DietCL,它联合利用未标注数据和标注数据。DietCL精心为两种数据类型分配计算预算。我们在多个数据集(如CLOC、ImageNet10K和CGLM)上,在受限预算设定下验证了该基线方法。DietCL以较大优势优于所有现有的有监督CL算法以及最新的持续半监督方法。我们广泛的分析和消融实验表明,DietCL在标签稀疏度、计算预算以及各种其他消融设置的完整范围内均保持稳定。