Domain adaptive semantic segmentation attempts to make satisfactory dense predictions on an unlabeled target domain by utilizing the supervised model trained on a labeled source domain. In this work, we propose Semantic-Guided Pixel Contrast (SePiCo), a novel one-stage adaptation framework that highlights the semantic concepts of individual pixels to promote learning of class-discriminative and class-balanced pixel representations across domains, eventually boosting the performance of self-training methods. Specifically, to explore proper semantic concepts, we first investigate a centroid-aware pixel contrast that employs the category centroids of the entire source domain or a single source image to guide the learning of discriminative features. Considering the possible lack of category diversity in semantic concepts, we then blaze a trail of distributional perspective to involve a sufficient quantity of instances, namely distribution-aware pixel contrast, in which we approximate the true distribution of each semantic category from the statistics of labeled source data. Moreover, such an optimization objective can derive a closed-form upper bound by implicitly involving an infinite number of (dis)similar pairs, making it computationally efficient. Extensive experiments show that SePiCo not only helps stabilize training but also yields discriminative representations, making significant progress on both synthetic-to-real and daytime-to-nighttime adaptation scenarios.
翻译:域自适应语义分割旨在利用在标记源域上训练的监督模型,在未标记的目标域上实现令人满意的密集预测。在这项工作中,我们提出了语义引导的像素对比(SePiCo),一种新颖的单阶段自适应框架,该框架突出单个像素的语义概念,以促进跨域的类别判别性和类别平衡的像素表示学习,最终提升自训练方法的性能。具体来说,为了探索合适的语义概念,我们首先研究了一种质心感知的像素对比,该对比利用整个源域或单个源图像的类别质心来引导判别性特征的学习。考虑到语义概念中可能缺乏类别多样性,我们随后开创性地从分布角度引入足够数量的实例,即分布感知的像素对比,其中我们从标记源数据的统计中近似每个语义类别的真实分布。此外,这种优化目标可以通过隐式包含无限数量(不)相似对来推导出封闭形式的上界,从而使其计算高效。大量实验表明,SePiCo不仅有助于稳定训练,还能产生判别性表示,在合成到真实和白天到夜晚的自适应场景中均取得了显著进展。