Semi-supervised semantic segmentation (SSSS) has been proposed to alleviate the burden of time-consuming pixel-level manual labeling, which leverages limited labeled data along with larger amounts of unlabeled data. Current state-of-the-art methods train the labeled data with ground truths and unlabeled data with pseudo labels. However, the two training flows are separate, which allows labeled data to dominate the training process, resulting in low-quality pseudo labels and, consequently, sub-optimal results. To alleviate this issue, we present AllSpark, which reborns the labeled features from unlabeled ones with the channel-wise cross-attention mechanism. We further introduce a Semantic Memory along with a Channel Semantic Grouping strategy to ensure that unlabeled features adequately represent labeled features. The AllSpark shed new light on the architecture level designs of SSSS rather than framework level, which avoids increasingly complicated training pipeline designs. It can also be regarded as a flexible bottleneck module that can be seamlessly integrated into a general transformer-based segmentation model. The proposed AllSpark outperforms existing methods across all evaluation protocols on Pascal, Cityscapes and COCO benchmarks without bells-and-whistles. Code and model weights are available at: https://github.com/xmed-lab/AllSpark.
翻译:半监督语义分割旨在减轻耗时像素级人工标注的负担,通过利用有限标注数据与大量无标注数据相结合。当前最先进的方法使用真实标签训练标注数据,并使用伪标签训练无标注数据。然而,这两个训练流程相互分离,导致标注数据主导训练过程,进而产生低质量伪标签并得到次优结果。为解决此问题,我们提出AllSpark,通过通道级交叉注意力机制从无标注特征中重生标注特征。我们进一步引入语义记忆与通道语义分组策略,确保无标注特征充分表征标注特征。AllSpark在架构层面而非框架层面为半监督语义分割提供了新思路,避免了日益复杂的训练流程设计。它可作为灵活的瓶颈模块,无缝集成到基于Transformer的通用分割模型中。所提出的AllSpark在Pascal、Cityscapes和COCO基准测试的所有评估协议上均优于现有方法,且无需复杂辅助设计。代码和模型权重已开源:https://github.com/xmed-lab/AllSpark。