Clustering continues to be a significant and challenging task. Recent studies have demonstrated impressive results by applying clustering to feature representations acquired through self-supervised learning, particularly on small datasets. However, when dealing with datasets containing a large number of clusters, such as ImageNet, current methods struggle to achieve satisfactory clustering performance. In this paper, we introduce a novel method called Contrastive representation Disentanglement for Clustering (CDC) that leverages contrastive learning to directly disentangle the feature representation for clustering. In CDC, we decompose the representation into two distinct components: one component encodes categorical information under an equipartition constraint, and the other component captures instance-specific factors. To train our model, we propose a contrastive loss that effectively utilizes both components of the representation. We conduct a theoretical analysis of the proposed loss and highlight how it assigns different weights to negative samples during the process of disentangling the feature representation. Further analysis of the gradients reveals that larger weights emphasize a stronger focus on hard negative samples. As a result, the proposed loss exhibits strong expressiveness, enabling efficient disentanglement of categorical information. Through experimental evaluation on various benchmark datasets, our method demonstrates either state-of-the-art or highly competitive clustering performance. Notably, on the complete ImageNet dataset, we achieve an accuracy of 53.4%, surpassing existing methods by a substantial margin of +10.2%.
翻译:聚类仍然是一项重要且具有挑战性的任务。近期研究表明,将聚类应用于通过自监督学习获得的特征表示能取得令人印象深刻的结果,尤其是在小型数据集上。然而,当处理包含大量类别的数据集(如ImageNet)时,现有方法难以达到令人满意的聚类性能。本文提出一种名为对比表示解耦聚类(CDC)的新方法,利用对比学习直接解耦用于聚类的特征表示。在CDC中,我们将表示分解为两个截然不同的部分:一个部分在等分约束下编码类别信息,另一个部分捕捉实例特定因素。为训练模型,我们提出一种能有效利用表示中两个组件的对比损失函数。我们对所提损失函数进行理论分析,阐明其在解耦特征表示过程中如何为负样本分配不同权重。梯度分析进一步揭示,较大权重会促使模型更关注困难负样本。因此,所提损失函数展现出强大表达能力,能够高效解耦类别信息。在多个基准数据集上的实验评估表明,我们的方法取得最先进或极具竞争力的聚类性能。值得注意的是,在完整ImageNet数据集上,我们达到53.4%的准确率,超出现有方法+10.2%的显著优势。