We revisit the common practice of evaluating adaptation of Online Continual Learning (OCL) algorithms through the metric of online accuracy, which measures the accuracy of the model on the immediate next few samples. However, we show that this metric is unreliable, as even vacuous blind classifiers, which do not use input images for prediction, can achieve unrealistically high online accuracy by exploiting spurious label correlations in the data stream. Our study reveals that existing OCL algorithms can also achieve high online accuracy, but perform poorly in retaining useful information, suggesting that they unintentionally learn spurious label correlations. To address this issue, we propose a novel metric for measuring adaptation based on the accuracy on the near-future samples, where spurious correlations are removed. We benchmark existing OCL approaches using our proposed metric on large-scale datasets under various computational budgets and find that better generalization can be achieved by retaining and reusing past seen information. We believe that our proposed metric can aid in the development of truly adaptive OCL methods. We provide code to reproduce our results at https://github.com/drimpossible/EvalOCL.
翻译:我们重新审视了在线持续学习(OCL)算法中通过在线准确率这一指标评估适应性的常见做法,该指标衡量模型对紧邻后续样本的预测准确度。然而,我们证明该指标并不可靠:即便是完全不使用输入图像进行预测的空洞盲分类器,也能通过利用数据流中的虚假标签相关性获得不切实际的高在线准确率。研究表明,现有OCL算法同样能达到高在线准确率,但在保留有用信息方面表现欠佳,暗示它们无意识地学习了虚假标签相关性。为解决该问题,我们提出了一种基于消除虚假相关性后的近未来样本准确率来衡量适应性的新指标。通过在不同计算预算下对大规模数据集使用所提指标评估现有OCL方法,我们发现保留并重用过去信息能实现更好的泛化能力。我们相信所提指标将有助于开发真正自适应的OCL方法。可复现结果的开源代码见https://github.com/drimpossible/EvalOCL。