Although instance segmentation methods have improved considerably, the dominant paradigm is to rely on fully-annotated training images, which are tedious to obtain. To alleviate this reliance, and boost results, semi-supervised approaches leverage unlabeled data as an additional training signal that limits overfitting to the labeled samples. In this context, we present novel design choices to significantly improve teacher-student distillation models. In particular, we (i) improve the distillation approach by introducing a novel "guided burn-in" stage, and (ii) evaluate different instance segmentation architectures, as well as backbone networks and pre-training strategies. Contrary to previous work which uses only supervised data for the burn-in period of the student model, we also use guidance of the teacher model to exploit unlabeled data in the burn-in period. Our improved distillation approach leads to substantial improvements over previous state-of-the-art results. For example, on the Cityscapes dataset we improve mask-AP from 23.7 to 33.9 when using labels for 10\% of images, and on the COCO dataset we improve mask-AP from 18.3 to 34.1 when using labels for only 1\% of the training data.
翻译:尽管实例分割方法已取得显著进展,但主流范式仍依赖完全标注的训练图像,而这类标注获取过程繁琐。为减轻标注依赖并提升性能,半监督方法利用无标注数据作为额外训练信号,限制模型对标注样本的过拟合。在此背景下,我们提出新颖设计选择以显著改进师生蒸馏模型。具体而言,我们:(i) 通过引入新型"引导预热"阶段优化蒸馏方法;(ii) 评估不同实例分割架构、骨干网络及预训练策略。与先前仅使用监督数据进行学生模型预热阶段的工作不同,我们还在预热阶段引入教师模型的引导以利用无标注数据。改进后的蒸馏方法在先前最优结果基础上实现大幅提升。例如,在Cityscapes数据集上,当仅使用10%图像的标签时,我们将掩码AP从23.7提升至33.9;在COCO数据集上,当仅使用1%训练数据的标签时,我们将掩码AP从18.3提升至34.1。