Recently, representation learning with contrastive learning algorithms has been successfully applied to challenging unlabeled datasets. However, these methods are unable to distinguish important features from unimportant ones under simply unsupervised settings, and definitions of importance vary according to the type of downstream task or analysis goal, such as the identification of objects or backgrounds. In this paper, we focus on unsupervised image clustering as the downstream task and propose a representation learning method that enhances features critical to the clustering task. We extend a clustering-friendly contrastive learning method and incorporate a contrastive analysis approach, which utilizes a reference dataset to separate important features from unimportant ones, into the design of loss functions. Conducting an experimental evaluation of image clustering for three datasets with characteristic backgrounds, we show that for all datasets, our method achieves higher clustering scores compared with conventional contrastive analysis and deep clustering methods.
翻译:近年来,基于对比学习算法的表征学习方法已成功应用于具有挑战性的无标签数据集。然而,在纯无监督设置下,这些方法难以区分重要特征与不重要特征,且重要性的定义会因下游任务或分析目标的类型而异,例如物体识别或背景识别。本文以下游任务——无监督图像聚类为焦点,提出一种能够增强对聚类任务关键特征的表征学习方法。我们扩展了一种面向聚类友好的对比学习方法,并将对比分析思路——利用参考数据集分离重要特征与不重要特征——融入损失函数的设计中。通过对三个具有典型背景的数据集进行图像聚类实验评估,我们表明,对于所有数据集,相较于传统的对比分析方法和深度聚类方法,本方法均获得了更高的聚类分数。