In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main model during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks.
翻译:本文研究以知识蒸馏(KD)作为正则化策略的无样本类增量学习(CIL),旨在防止遗忘。基于KD的方法虽已成功应用于CIL,但在缺乏先前任务训练数据样本的情况下,往往难以有效正则化模型。我们的分析表明,该问题源于教师网络在处理分布外数据时产生显著的表示偏移,导致KD损失分量出现较大误差,进而造成CIL性能下降。受近期测试时自适应方法的启发,我们提出教师自适应方法(Teacher Adaptation, TA),该方法在增量训练期间同步更新教师模型与主模型。本方法可无缝集成于基于KD的CIL方法,并在多个无样本CIL基准测试中持续提升其性能。