Compared to conventional semantic segmentation with pixel-level supervision, Weakly Supervised Semantic Segmentation (WSSS) with image-level labels poses the challenge that it always focuses on the most discriminative regions, resulting in a disparity between fully supervised conditions. A typical manifestation is the diminished precision on the object boundaries, leading to a deteriorated accuracy of WSSS. To alleviate this issue, we propose to adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing. For uncertain cues, we employ an activation-based masking strategy and seek to recover the local information with self-distilled knowledge. We further assume that the unmasked confident regions should be robust enough to preserve the global semantics. Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels. Extensive experiments conducted on PASCAL VOC 2012 and MS COCO 2014 demonstrate that our proposed single-stage approach for WSSS not only outperforms state-of-the-art benchmarks remarkably but also surpasses multi-stage methodologies that trade complexity for accuracy. The code can be found at https://github.com/Jessie459/feature-self-reinforcement.
翻译:与需要像素级标注的传统语义分割相比,基于图像级标签的弱监督语义分割(WSSS)面临一个挑战:模型始终聚焦于最具判别性的区域,导致与全监督条件之间存在差异。典型表现是对象边界处的精度下降,从而降低了WSSS的准确性。为缓解该问题,我们提出将图像内容自适应划分为确定区域(如可靠前景与背景)和不确定区域(如对象边界及误分类类别)分别处理。针对不确定线索,我们采用基于激活的掩蔽策略,通过自蒸馏知识恢复局部信息。进一步假设未被掩蔽的确定区域应具有足够鲁棒性以保持全局语义,据此引入互补性自增强方法,约束这些确定区域与具有相同类别标签的增强图像之间的语义一致性。在PASCAL VOC 2012和MS COCO 2014数据集上的大量实验表明,我们提出的单阶段WSSS方法不仅显著优于当前最优基准,还超越了以复杂度换取精度的多阶段方法。代码可见于https://github.com/Jessie459/feature-self-reinforcement。