Unsupervised object discovery (UOD) refers to the task of discriminating the whole region of objects from the background within a scene without relying on labeled datasets, which benefits the task of bounding-box-level localization and pixel-level segmentation. This task is promising due to its ability to discover objects in a generic manner. We roughly categorise existing techniques into two main directions, namely the generative solutions based on image resynthesis, and the clustering methods based on self-supervised models. We have observed that the former heavily relies on the quality of image reconstruction, while the latter shows limitations in effectively modeling semantic correlations. To directly target at object discovery, we focus on the latter approach and propose a novel solution by incorporating weakly-supervised contrastive learning (WCL) to enhance semantic information exploration. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images, which is achieved by fine-tuning the feature encoder of a self-supervised model, namely DINO, via WCL. Subsequently, we introduce Principal Component Analysis (PCA) to localize object regions. The principal projection direction, corresponding to the maximal eigenvalue, serves as an indicator of the object region(s). Extensive experiments on benchmark unsupervised object discovery datasets demonstrate the effectiveness of our proposed solution. The source code and experimental results are publicly available via our project page at https://github.com/npucvr/WSCUOD.git.
翻译:无监督目标发现(UOD)指无需依赖标注数据集,即可从场景中区分出目标整体区域与背景的任务,这有助于边界框级定位和像素级分割任务。该任务因能以一种通用方式发现目标而具有前景。我们将现有技术大致分为两个主要方向:基于图像再合成的生成式解决方案,以及基于自监督模型的聚类方法。我们观察到前者高度依赖图像重建质量,而后者在有效建模语义相关性方面存在局限。为了直接针对目标发现,我们聚焦后一种方法,并提出了一种融合弱监督对比学习(WCL)以增强语义信息挖掘的新方案。我们设计了一种语义引导的自监督学习模型,通过WCL微调自监督模型(即DINO)的特征编码器,从而从图像中提取高层语义特征。随后,我们引入主成分分析(PCA)来定位目标区域。对应最大特征值的主投影方向可作为目标区域的指示器。在基准无监督目标发现数据集上的大量实验证明了所提方案的有效性。源代码和实验结果可通过我们的项目页面(https://github.com/npucvr/WSCUOD.git)公开获取。