Estimating the number of clusters and cluster structures in unlabeled, complex, and high-dimensional datasets (like images) is challenging for traditional clustering algorithms. In recent years, a matrix reordering-based algorithm called Visual Assessment of Tendency (VAT), and its variants have attracted many researchers from various domains to estimate the number of clusters and inherent cluster structure present in the data. However, these algorithms face significant challenges when dealing with image data as they fail to effectively capture the crucial features inherent in images. To overcome these limitations, we propose a deep-learning-based framework that enables the assessment of cluster structure in complex image datasets. Our approach utilizes a self-supervised deep neural network to generate representative embeddings for the data. These embeddings are then reduced to 2-dimension using t-distributed Stochastic Neighbour Embedding (t-SNE) and inputted into VAT based algorithms to estimate the underlying cluster structure. Importantly, our framework does not rely on any prior knowledge of the number of clusters. Our proposed approach demonstrates superior performance compared to state-of-the-art VAT family algorithms and two other deep clustering algorithms on four benchmark image datasets, namely MNIST, FMNIST, CIFAR-10, and INTEL.
翻译:估计无标签、复杂且高维数据集(如图像)中的聚类数量和聚类结构对传统聚类算法而言具有挑战性。近年来,一种基于矩阵重排序的算法——视觉评估趋势(VAT)及其变体吸引了不同领域众多研究者的关注,用于估计数据中的聚类数量及内在聚类结构。然而,这些算法在处理图像数据时面临显著挑战,因为它们无法有效捕捉图像中的关键特征。为克服这些局限,我们提出了一种基于深度学习的框架,能够评估复杂图像数据集中的聚类结构。我们的方法利用自监督深度神经网络生成数据的代表性嵌入,然后通过t分布随机邻域嵌入(t-SNE)将这些嵌入降至二维,并输入到基于VAT的算法中,以估计潜在的聚类结构。重要的是,我们的框架不依赖于任何关于聚类数量的先验知识。在四个基准图像数据集(即MNIST、FMNIST、CIFAR-10和INTEL)上,我们提出的方法相比最先进的VAT系列算法及其他两种深度聚类算法展现了更优的性能。