Online Continual Learning (OCL) addresses the problem of training neural networks on a continuous data stream where multiple classification tasks emerge in sequence. In contrast to offline Continual Learning, data can be seen only once in OCL. In this context, replay-based strategies have achieved impressive results and most state-of-the-art approaches are heavily depending on them. While Knowledge Distillation (KD) has been extensively used in offline Continual Learning, it remains under-exploited in OCL, despite its potential. In this paper, we theoretically analyze the challenges in applying KD to OCL. We introduce a direct yet effective methodology for applying Momentum Knowledge Distillation (MKD) to many flagship OCL methods and demonstrate its capabilities to enhance existing approaches. In addition to improving existing state-of-the-arts accuracy by more than $10\%$ points on ImageNet100, we shed light on MKD internal mechanics and impacts during training in OCL. We argue that similar to replay, MKD should be considered a central component of OCL.
翻译:在线持续学习(Online Continual Learning, OCL)旨在解决在连续数据流中训练神经网络的问题,其中多个分类任务按顺序出现。与离线持续学习不同,OCL中数据仅能被观察一次。在此背景下,基于回放的策略已取得显著成果,且大多数最先进方法高度依赖这些策略。尽管知识蒸馏(Knowledge Distillation, KD)在离线持续学习中被广泛使用,但在OCL中,尽管其具有潜力,仍未得到充分开发。本文从理论上分析了将KD应用于OCL所面临的挑战。我们提出了一种直接而有效的方法,将动量知识蒸馏(Momentum Knowledge Distillation, MKD)应用于多种主流OCL方法,并展示了其增强现有方法的能力。除了在ImageNet100上将现有最先进算法的准确率提升超过10%以外,我们还揭示了MKD在OCL训练过程中的内部机制及其影响。我们认为,与回放类似,MKD应被视为OCL的核心组成部分。