Semi-supervised semantic segmentation has recently gained increasing research interest as it can reduce the requirement for large-scale fully-annotated training data by effectively exploiting large amounts of unlabelled data. The current methods often suffer from the confirmation bias from the pseudo-labelling process, which can be alleviated by the co-training framework. The current co-training-based semi-supervised semantic segmentation methods rely on hand-crafted perturbations to prevent the different sub-nets from collapsing into each other, but these artificial perturbations cannot lead to the optimal solution. In this work, we propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework for semi-supervised semantic segmentation. Our work aims at enforcing the two sub-nets to learn informative features from irrelevant views. In particular, we first propose a new cross-view consistency (CVC) strategy that encourages the two sub-nets to learn distinct features from the same input by introducing a feature discrepancy loss, while these distinct features are expected to generate consistent prediction scores of the input. The CVC strategy helps to prevent the two sub-nets from stepping into the collapse. In addition, we further propose a conflict-based pseudo-labelling (CPL) method to guarantee the model will learn more useful information from conflicting predictions, which will lead to a stable training process. We validate our new semi-supervised semantic segmentation approach on the widely used benchmark datasets PASCAL VOC 2012 and Cityscapes, where our method achieves new state-of-the-art performance.
翻译:半监督语义分割近年来因能通过有效利用大量未标注数据减少对大规模完全标注训练数据的需求而受到越来越多的研究关注。现有方法常受伪标签过程中确认偏差的影响,而协同训练框架可缓解这一问题。当前基于协同训练的半监督语义分割方法依赖手工设计的扰动来防止不同子网络相互塌缩,但这些人工扰动无法得到最优解。本文提出一种基于双分支协同训练框架的冲突驱动跨视角一致性(CCVC)方法,用于半监督语义分割。本研究旨在促使两个子网络从无关视角学习信息丰富的特征。具体而言,我们首先提出一种新的跨视角一致性(CVC)策略,通过引入特征差异损失鼓励两个子网络从相同输入中学习不同特征,同时这些不同特征需产生输入的一致预测分数。CVC策略有助于防止两个子网络陷入塌缩。此外,我们进一步提出基于冲突的伪标签(CPL)方法,确保模型从冲突预测中学习更有用的信息,从而实现稳定训练过程。我们在广泛使用的基准数据集PASCAL VOC 2012和Cityscapes上验证了所提出的半监督语义分割方法,其取得了最新的最优性能。