Compared to conventional semantic segmentation with pixel-level supervision, Weakly Supervised Semantic Segmentation (WSSS) with image-level labels poses the challenge that it always focuses on the most discriminative regions, resulting in a disparity between fully supervised conditions. A typical manifestation is the diminished precision on the object boundaries, leading to a deteriorated accuracy of WSSS. To alleviate this issue, we propose to adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing. For uncertain cues, we employ an activation-based masking strategy and seek to recover the local information with self-distilled knowledge. We further assume that the unmasked confident regions should be robust enough to preserve the global semantics. Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels. Extensive experiments conducted on PASCAL VOC 2012 and MS COCO 2014 demonstrate that our proposed single-stage approach for WSSS not only outperforms state-of-the-art benchmarks remarkably but also surpasses multi-stage methodologies that trade complexity for accuracy. The code can be found at \url{https://github.com/Jessie459/feature-self-reinforcement}.
翻译:相较于采用像素级监督的传统语义分割,图像级标签的弱监督语义分割(WSSS)面临的挑战在于其始终聚焦于最具判别性的区域,导致与全监督条件之间存在差异。典型表现是目标边界的精度下降,进而造成弱监督语义分割准确率恶化。为缓解该问题,本文提出自适应地将图像内容划分为确定性区域(如高置信度前景与背景)和不确定区域(如目标边界与误分类类别)进行分别处理。针对不确定线索,我们采用基于激活值的掩码策略,并尝试以自蒸馏知识恢复局部信息。进一步假设未掩码的置信区域应具备足够鲁棒性以保持全局语义。基于此,我们引入互补性自增强方法,约束这些置信区域与具有相同类别标签的增强图像之间的语义一致性。在PASCAL VOC 2012和MS COCO 2014数据集上的大量实验表明,本文提出的单阶段弱监督语义分割方法不仅显著超越现有最优基准,更优于以复杂度换取精度的多阶段方法。代码开源于\url{https://github.com/Jessie459/feature-self-reinforcement}。