Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models using training image data with only image-level supervision. Since precise pixel-level annotations are not accessible, existing methods typically focus on producing pseudo masks for training segmentation models by refining CAM-like heatmaps. However, the produced heatmaps may only capture discriminative image regions of target object categories or the associated co-occurring backgrounds. To address the issues, we propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP space to enhance the semantic alignment between the segmented regions and the target object categories. More specifically, we propose Contrastive Prompt Learning and Class-associated Semantic Refinement to learn the prompts that adequately describe and suppress the image backgrounds associated with each target object category. In this way, our proposed framework is able to perform better semantic matching between object regions and the associated text labels, resulting in desired pseudo masks for training the segmentation model. The proposed SemPLeS framework achieves SOTA performance on the standard WSSS benchmarks, PASCAL VOC and MS COCO, and demonstrated interpretability with the semantic visualization of our learned prompts. The codes will be released.
翻译:弱监督语义分割(WSSS)旨在利用仅含图像级监督的训练图像数据来训练分割模型。由于缺乏精确的像素级标注,现有方法通常通过优化类激活图(CAM)类热图来生成用于训练分割模型的伪掩码。然而,生成的热图可能仅捕捉目标物体类别的判别性图像区域或共现的背景区域。为解决这些问题,我们提出了一种面向WSSS的语义提示学习(SemPLeS)框架,该框架通过学习有效提示CLIP空间来增强分割区域与目标物体类别之间的语义对齐。具体而言,我们提出了对比提示学习与类别关联语义精化方法,以学习能够充分描述并抑制与每个目标物体类别相关的图像背景的提示。通过这种方式,我们提出的框架能够在物体区域与关联文本标签之间实现更优的语义匹配,从而生成用于训练分割模型的理想伪掩码。所提出的SemPLeS框架在标准WSSS基准数据集PASCAL VOC和MS COCO上达到了最先进的性能,并通过学习到的提示的语义可视化展现了可解释性。代码将开源。