Semi-supervised semantic segmentation (SSS) is an important task that utilizes both labeled and unlabeled data to reduce expenses on labeling training examples. However, the effectiveness of SSS algorithms is limited by the difficulty of fully exploiting the potential of unlabeled data. To address this, we propose a dual-level Siamese structure network (DSSN) for pixel-wise contrastive learning. By aligning positive pairs with a pixel-wise contrastive loss using strong augmented views in both low-level image space and high-level feature space, the proposed DSSN is designed to maximize the utilization of available unlabeled data. Additionally, we introduce a novel class-aware pseudo-label selection strategy for weak-to-strong supervision, which addresses the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.
翻译:半监督语义分割是一项利用有标签和无标签数据来减少训练样本标注成本的重要任务。然而,由于难以充分挖掘无标签数据的潜力,半监督语义分割算法的有效性受到限制。为解决这一问题,我们提出了一种用于像素级对比学习的双层孪生结构网络(DSSN)。通过在低层图像空间和高层特征空间中利用强增强视图,以像素级对比损失对齐正样本对,所提出的DSSN旨在最大化现有无标签数据的利用效率。此外,我们引入了一种新颖的类别感知伪标签选择策略用于弱到强监督,该策略解决了大多数现有方法不进行选择或对所有类别应用预定义阈值的局限性。具体而言,该策略针对每个类别选择弱视图中的高置信度预测结果来生成伪标签,以监督强增强视图。该方法能够考虑类别不平衡问题,并提升长尾类别的性能。我们在PASCAL VOC 2012和Cityscapes两个数据集上取得了最先进的成果,显著优于其他半监督语义分割算法。