Deep clustering has gained significant attention due to its capability in learning clustering-friendly representations without labeled data. However, previous deep clustering methods tend to treat all samples equally, which neglect the variance in the latent distribution and the varying difficulty in classifying or clustering different samples. To address this, this paper proposes a novel end-to-end deep clustering method with diffused sampling and hardness-aware self-distillation (HaDis). Specifically, we first align one view of instances with another view via diffused sampling alignment (DSA), which helps improve the intra-cluster compactness. To alleviate the sampling bias, we present the hardness-aware self-distillation (HSD) mechanism to mine the hardest positive and negative samples and adaptively adjust their weights in a self-distillation fashion, which is able to deal with the potential imbalance in sample contributions during optimization. Further, the prototypical contrastive learning is incorporated to simultaneously enhance the inter-cluster separability and intra-cluster compactness. Experimental results on five challenging image datasets demonstrate the superior clustering performance of our HaDis method over the state-of-the-art. Source code is available at https://github.com/Regan-Zhang/HaDis.
翻译:深度聚类因其无需标签数据即可学习聚类友好表示的能力而备受关注。然而,以往的深度聚类方法倾向于平等对待所有样本,忽略了潜在分布中的方差以及不同样本在分类或聚类中的难度差异。为解决此问题,本文提出了一种新颖的端到端深度聚类方法,融合了扩散采样与难度感知自蒸馏(HaDis)。具体而言,我们首先通过扩散采样对齐(DSA)将实例的一个视图与另一视图对齐,这有助于提升簇内紧密性。为缓解采样偏差,我们提出了难度感知自蒸馏(HSD)机制,以挖掘最难正样本和负样本,并通过自蒸馏方式自适应调整它们的权重,从而应对优化过程中样本贡献的潜在不平衡。此外,我们还引入了原型对比学习,以同时增强簇间分离性和簇内紧密性。在五个具有挑战性的图像数据集上的实验结果表明,我们的HaDis方法在聚类性能上优于现有最先进方法。源代码可在https://github.com/Regan-Zhang/HaDis获取。