Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Current approaches often rely on extracting deep features from pre-trained networks to construct a graph, and classical clustering methods like k-means and normalized-cuts are then applied as a post-processing step. However, this approach reduces the high-dimensional information encoded in the features to pair-wise scalar affinities. To address this limitation, this study introduces a lightweight Graph Neural Network (GNN) to replace classical clustering methods while optimizing for the same clustering objective function. Unlike existing methods, our GNN takes both the pair-wise affinities between local image features and the raw features as input. This direct connection between the raw features and the clustering objective enables us to implicitly perform classification of the clusters between different graphs, resulting in part semantic segmentation without the need for additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training an image segmentation GNN. Furthermore, we employ the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters, allowing for k-less clustering. We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.
翻译:图像分割是计算机视觉中的一项基础任务。为训练有监督方法而进行的数据标注可能非常耗时,这推动了无监督方法的发展。当前方法通常依赖从预训练网络中提取深层特征来构建图,随后应用k-means和归一化割等经典聚类方法作为后处理步骤。然而,这种方法将特征编码的高维信息降级为成对标量亲和度。为解决这一局限,本研究引入一种轻量级图神经网络(GNN)来替代经典聚类方法,同时针对相同的聚类目标函数进行优化。与现有方法不同,我们的GNN将局部图像特征间的成对亲和度与原始特征同时作为输入。这种原始特征与聚类目标之间的直接连接使我们能够隐式完成不同图之间聚类的分类,从而实现无需额外后处理步骤的部分语义分割。我们展示了如何将经典聚类目标表述为用于训练图像分割GNN的自监督损失函数。此外,我们采用相关性聚类(Correlation-Clustering, CC)目标在无需定义聚类数量的情况下进行聚类,从而实现无k值聚类。我们将所提方法应用于目标定位、分割和语义部件分割任务,在多个基准数据集上超越了当前最优性能。