Despite the success of deep learning methods on instance segmentation, these models still suffer from catastrophic forgetting in continual learning scenarios. In this paper, our contributions for continual instance segmentation are threefold. First, we propose the Y-knowledge distillation (Y-KD), a knowledge distillation strategy that shares a common feature extractor between the teacher and student networks. As the teacher is also updated with new data in Y-KD, the increased plasticity results in new modules that are specialized on new classes. Second, our Y-KD approach is supported by a dynamic architecture method that grows new modules for each task and uses all of them for inference with a unique instance segmentation head, which significantly reduces forgetting. Third, we complete our approach by leveraging checkpoint averaging as a simple method to manually balance the trade-off between the performance on the various sets of classes, thus increasing the control over the model's behavior without any additional cost. These contributions are united in our model that we name the Dynamic Y-KD network. We perform extensive experiments on several single-step and multi-steps scenarios on Pascal-VOC, and we show that our approach outperforms previous methods both on past and new classes. For instance, compared to recent work, our method obtains +2.1% mAP on old classes in 15-1, +7.6% mAP on new classes in 19-1 and reaches 91.5% of the mAP obtained by joint-training on all classes in 15-5.
翻译:尽管深度学习方法在实例分割中取得了成功,但这些模型在持续学习场景中仍存在灾难性遗忘问题。本文在持续实例分割方面的贡献主要有三点。首先,我们提出了Y型知识蒸馏(Y-KD),这是一种知识蒸馏策略,在教师网络与学生网络之间共享通用特征提取器。由于教师网络在Y-KD中也会随新数据更新,其增强的可塑性产生了专门针对新类别的专用模块。其次,我们的Y-KD方法得到了动态架构方法的支持——该方法为每个任务生成新模块,并通过统一实例分割头使用所有模块进行推理,从而显著减少遗忘。第三,我们通过利用检查点平均法来完善方法,这是一种简单的手动平衡各类别集合性能间权衡的方法,从而在不增加额外成本的情况下增强对模型行为的控制。这些贡献统一集成到我们命名为"动态Y-KD网络"的模型中。我们在Pascal-VOC数据集上进行了单步与多步场景的广泛实验,结果表明我们的方法在旧类别与新类别上均优于先前方法。例如,与近期工作相比,我们的方法在15-1场景下的旧类别上mAP提升了+2.1%,在19-1场景下的新类别上mAP提升了+7.6%,并在15-5场景下达到了所有类别联合训练所得mAP的91.5%。