Weakly supervised semantic segmentation (WSSS) employing weak forms of labels has been actively studied to alleviate the annotation cost of acquiring pixel-level labels. However, classifiers trained on biased datasets tend to exploit shortcut features and make predictions based on spurious correlations between certain backgrounds and objects, leading to a poor generalization performance. In this paper, we propose shortcut mitigating augmentation (SMA) for WSSS, which generates synthetic representations of object-background combinations not seen in the training data to reduce the use of shortcut features. Our approach disentangles the object-relevant and background features. We then shuffle and combine the disentangled representations to create synthetic features of diverse object-background combinations. SMA-trained classifier depends less on contexts and focuses more on the target object when making predictions. In addition, we analyzed the behavior of the classifier on shortcut usage after applying our augmentation using an attribution method-based metric. The proposed method achieved the improved performance of semantic segmentation result on PASCAL VOC 2012 and MS COCO 2014 datasets.
翻译:采用弱标签形式的弱监督语义分割(WSSS)已被积极研究,以减轻获取像素级标注的标注成本。然而,在偏置数据集上训练的分类器倾向于利用捷径特征,并基于某些背景与对象之间的虚假相关性进行预测,导致泛化性能较差。本文针对WSSS提出捷径缓解增强(SMA),该方法生成训练数据中未见过的对象-背景组合的合成表征,以减少对捷径特征的依赖。我们的方法解耦了对象相关特征与背景特征。随后,我们将解耦后的表征进行随机重组,以创建多样化对象-背景组合的合成特征。经过SMA训练的分类器在预测时更少依赖上下文,而更专注于目标对象。此外,我们使用基于归因方法的度量指标,分析了应用本增强方法后分类器在捷径使用上的行为变化。所提方法在PASCAL VOC 2012和MS COCO 2014数据集上实现了语义分割结果的性能提升。