Facilitated by the powerful feature extraction ability of neural networks, deep clustering has achieved great success in analyzing high-dimensional and complex real-world data. The performance of deep clustering methods is affected by various factors such as network structures and learning objectives. However, as pointed out in this survey, the essence of deep clustering lies in the incorporation and utilization of prior knowledge, which is largely ignored by existing works. From pioneering deep clustering methods based on data structure assumptions to recent contrastive clustering methods based on data augmentation invariances, the development of deep clustering intrinsically corresponds to the evolution of prior knowledge. In this survey, we provide a comprehensive review of deep clustering methods by categorizing them into six types of prior knowledge. We find that in general the prior innovation follows two trends, namely, i) from mining to constructing, and ii) from internal to external. Besides, we provide a benchmark on five widely-used datasets and analyze the performance of methods with diverse priors. By providing a novel prior knowledge perspective, we hope this survey could provide some novel insights and inspire future research in the deep clustering community.
翻译:得益于神经网络强大的特征提取能力,深度聚类在分析高维复杂的现实世界数据方面取得了巨大成功。深度聚类方法的性能受网络结构和学习目标等多种因素影响。然而,正如本综述所指出的,深度聚类的本质在于先验知识的融入与利用,而这一点在现有工作中很大程度上被忽视了。从基于数据结构假设的开创性深度聚类方法,到基于数据增强不变性的近期对比聚类方法,深度聚类的发展本质上对应着先验知识的演进。在本综述中,我们通过将深度聚类方法归类为六种先验知识类型,对其进行了全面回顾。我们发现,总体而言,先验创新遵循两大趋势:i) 从挖掘到构建,以及 ii) 从内部到外部。此外,我们在五个广泛使用的数据集上提供了基准测试,并分析了具有不同先验的方法的性能。通过提供新颖的先验知识视角,我们希望本综述能为深度聚类领域提供一些新的见解,并启发未来的研究。