Weakly-Supervised Semantic Segmentation (WSSS) aims to train segmentation models using image data with only image-level supervision. Since precise pixel-level annotations are not accessible, existing methods typically focus on producing pseudo masks for training segmentation models by refining CAM-like heatmaps. However, the produced heatmaps may capture only the discriminative image regions of object categories or the associated co-occurring backgrounds. To address the issues, we propose a Semantic Prompt Learning for WSSS (SemPLeS) framework, which learns to effectively prompt the CLIP latent space to enhance the semantic alignment between the segmented regions and the target object categories. More specifically, we propose Contrastive Prompt Learning and Prompt-guided Semantic Refinement to learn the prompts that adequately describe and suppress the co-occurring backgrounds associated with each target object category. In this way, SemPLeS can perform better semantic alignment between object regions and the associated class labels, resulting in desired pseudo masks for training the segmentation model. The proposed SemPLeS framework achieves SOTA performance on the standard WSSS benchmarks, PASCAL VOC and MS COCO, and shows compatibility with other WSSS methods. The source codes are provided in the supplementary.
翻译:弱监督语义分割旨在仅利用图像级监督的图像数据训练分割模型。由于无法获取精确的像素级标注,现有方法通常通过优化类激活映射热图来生成伪掩码,用于训练分割模型。然而,生成的热图可能仅捕获目标类别的判别性图像区域或关联的共现背景。为解决这一问题,我们提出语义提示学习框架SemPLeS,通过学习有效引导CLIP潜在空间,增强分割区域与目标类别之间的语义对齐。具体而言,我们提出对比提示学习和提示引导的语义精炼,以学习能充分描述并抑制每个目标类别关联的共现背景的提示。通过这种方式,SemPLeS能实现对象区域与关联类别标签间更优的语义对齐,从而生成训练分割模型所需的理想伪掩码。所提出的SemPLeS框架在标准弱监督语义分割基准PASCAL VOC和MS COCO上达到当前最优性能,并展现与其他弱监督语义分割方法的兼容性。源代码已在补充材料中提供。