Modern incremental learning for semantic segmentation methods usually learn new categories based on dense annotations. Although achieve promising results, pixel-by-pixel labeling is costly and time-consuming. Weakly incremental learning for semantic segmentation (WILSS) is a novel and attractive task, which aims at learning to segment new classes from cheap and widely available image-level labels. Despite the comparable results, the image-level labels can not provide details to locate each segment, which limits the performance of WILSS. This inspires us to think how to improve and effectively utilize the supervision of new classes given image-level labels while avoiding forgetting old ones. In this work, we propose a novel and data-efficient framework for WILSS, named FMWISS. Specifically, we propose pre-training based co-segmentation to distill the knowledge of complementary foundation models for generating dense pseudo labels. We further optimize the noisy pseudo masks with a teacher-student architecture, where a plug-in teacher is optimized with a proposed dense contrastive loss. Moreover, we introduce memory-based copy-paste augmentation to improve the catastrophic forgetting problem of old classes. Extensive experiments on Pascal VOC and COCO datasets demonstrate the superior performance of our framework, e.g., FMWISS achieves 70.7% and 73.3% in the 15-5 VOC setting, outperforming the state-of-the-art method by 3.4% and 6.1%, respectively.
翻译:现代增量学习语义分割方法通常基于密集标注来学习新类别。尽管取得了令人满意的结果,但逐像素标注成本高昂且耗时。弱增量语义分割学习(WILSS)是一项新颖且具吸引力的任务,旨在从廉价且广泛可用的图像级标签中学习分割新类别。尽管结果可比,但图像级标签无法提供定位每个分割区域的细节,这限制了WILSS的性能。这启发我们思考如何改进并有效利用给定图像级标签的新类别监督信息,同时避免遗忘旧类别。在本文中,我们提出了一种新颖且数据高效的WILSS框架,名为FMWISS。具体而言,我们提出基于预训练的共分割方法,以蒸馏互补基础模型的知识,生成密集伪标签。我们进一步通过教师-学生架构优化噪声伪掩码,其中插件式教师通过所提出的密集对比损失进行优化。此外,我们引入基于记忆的复制-粘贴数据增强,以改善旧类别的灾难性遗忘问题。在Pascal VOC和COCO数据集上的大量实验表明,我们的框架具有优越性能,例如,FMWISS在15-5 VOC设置下分别达到70.7%和73.3%,分别超过最先进方法3.4%和6.1%。