Semi-Supervised Semantic Segmentation (S4) aims to train a segmentation model with limited labeled images and a substantial volume of unlabeled images. To improve the robustness of representations, powerful methods introduce a pixel-wise contrastive learning approach in latent space (i.e., representation space) that aggregates the representations to their prototypes in a fully supervised manner. However, previous contrastive-based S4 methods merely rely on the supervision from the model's output (logits) in logit space during unlabeled training. In contrast, we utilize the outputs in both logit space and representation space to obtain supervision in a collaborative way. The supervision from two spaces plays two roles: 1) reduces the risk of over-fitting to incorrect semantic information in logits with the help of representations; 2) enhances the knowledge exchange between the two spaces. Furthermore, unlike previous approaches, we use the similarity between representations and prototypes as a new indicator to tilt training those under-performing representations and achieve a more efficient contrastive learning process. Results on two public benchmarks demonstrate the competitive performance of our method compared with state-of-the-art methods.
翻译:半监督语义分割旨在利用有限的标注图像和大量无标签图像训练分割模型。为提升表征鲁棒性,现有前沿方法在潜在空间(即表征空间)引入像素级对比学习方法,以全监督方式将表征向其原型聚拢。然而,以往基于对比学习的半监督语义分割方法在无标签训练阶段仅依赖模型在逻辑空间中的输出(预测值)进行监督。与此不同,我们协同利用逻辑空间和表征空间的输出获取监督信号。双空间监督具有双重作用:1)借助表征信息降低对逻辑空间中错误语义信息的过拟合风险;2)增强两个空间间的知识交换。此外,与现有方法不同,我们采用表征与原型的相似度作为新指标,倾斜训练欠佳表征,实现更高效的对比学习过程。在两个公开基准上的实验结果表明,本方法取得了与当前最先进方法相竞争的性能。