Despite the success of deep learning models on instance segmentation, current methods still suffer from catastrophic forgetting in continual learning scenarios. In this paper, our contributions for continual instance segmentation are threefold. First, we propose the Y-knowledge distillation (Y-KD), a technique that shares a common feature extractor between the teacher and student networks. As the teacher is also updated with new data in Y-KD, the increased plasticity results in new modules that are specialized on new classes. Second, our Y-KD approach is supported by a dynamic architecture method that trains task-specific modules with a unique instance segmentation head, thereby significantly reducing forgetting. Third, we complete our approach by leveraging checkpoint averaging as a simple method to manually balance the trade-off between performance on the various sets of classes, thus increasing control over the model's behavior without any additional cost. These contributions are united in our model that we name the Dynamic Y-KD network. We perform extensive experiments on several single-step and multi-steps incremental learning scenarios, and we show that our approach outperforms previous methods both on past and new classes. For instance, compared to recent work, our method obtains +2.1% mAP on old classes in 15-1, +7.6% mAP on new classes in 19-1 and reaches 91.5% of the mAP obtained by joint-training on all classes in 15-5.
翻译:尽管深度学习模型在实例分割任务上取得了成功,现有方法在持续学习场景中仍存在灾难性遗忘问题。本文在持续实例分割方面的贡献包含三个方面。首先,我们提出Y知识蒸馏(Y-KD)技术,该技术在教师网络与学生网络之间共享共同的特征提取器。由于Y-KD中教师网络也会用新数据进行更新,增加的塑性使得新模块能够专门处理新类别。其次,我们的Y-KD方法通过一种动态架构方法得到支持,该方法使用独特的实例分割头部训练任务特定模块,从而显著减少遗忘。第三,我们通过利用检查点平均作为简单方法来完善我们的方法,以手动平衡不同类别集合之间的性能权衡,从而在无需额外成本的情况下增强对模型行为的控制。这些贡献统一于我们命名为动态Y-KD网络的模型中。我们在多个单步和多步增量学习场景下进行了大量实验,结果表明我们的方法在旧类别和新类别上均优于先前方法。例如,与近期工作相比,我们的方法在15-1场景中旧类别的平均精度(mAP)提升2.1%,在19-1场景中新类别提升7.6%,并在15-5场景中达到联合训练所有类别所得mAP的91.5%。