In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked images by the Transformer encoder, three partial information learning modules are further incorporated, including the PISD module for training the auto-encoder via masked image reconstruction, the PICD module for employing two levels of contrastive learning, and the CLI module for mutual interaction between the instance-level and cluster-level subspaces. Extensive experiments have been conducted on six real-world image datasets, which demononstrate the superior clustering performance of the proposed PICI approach over the state-of-the-art deep clustering approaches. The source code is available at https://github.com/Regan-Zhang/PICI.
翻译:本文提出了一种名为PICI的新型深度图像聚类方法,该方法在联合学习框架中强制实施局部信息判别与跨层级交互。具体而言,我们采用Transformer编码器作为主干网络,通过该编码器构建了带两个并行增强视图的掩码图像建模。在Transformer编码器从掩码图像中提取类别令牌后,进一步集成了三个局部信息学习模块:用于通过掩码图像重建训练自编码器的PISD模块、采用双层级对比学习的PICD模块,以及用于实例级与聚类级子空间互动的CLI模块。我们在六个真实图像数据集上开展了大量实验,结果表明所提出的PICI方法在聚类性能上显著优于现有最先进的深度聚类方法。源代码已开源在https://github.com/Regan-Zhang/PICI。