Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.
翻译:无监督语义分割旨在无需人工标注的情况下实现高质量的语义分组。随着自监督预训练的出现,多种框架利用预训练特征训练预测头,以进行无监督密集预测。然而,这种无监督设置中的一个重大挑战是如何确定分割概念所需的适当聚类层级。为解决这一问题,我们提出了一种新颖框架——因果无监督语义分割(CAUSE),该框架利用因果推断的洞察。具体而言,我们引入基于干预的方法(即前门调整)来定义适用于无监督预测的两步任务。第一步是构建一个作为中介变量的概念聚类手册,该手册以离散化形式表示不同粒度层级下的可能概念原型。随后,该中介变量建立与后续概念级自监督学习的显式联系,以实现像素级分组。通过在多个数据集上的大量实验与分析,我们验证了CAUSE的有效性,并在无监督语义分割中达到了最先进的性能。