Clustering, as an unsupervised technique, plays a pivotal role in various data analysis applications. Among clustering algorithms, Spectral Clustering on Euclidean Spaces has been extensively studied. However, with the rapid evolution of data complexity, Euclidean Space is proving to be inefficient for representing and learning algorithms. Although Deep Neural Networks on hyperbolic spaces have gained recent traction, clustering algorithms or non-deep machine learning models on non-Euclidean Spaces remain underexplored. In this paper, we propose a spectral clustering algorithm on Hyperbolic Spaces to address this gap. Hyperbolic Spaces offer advantages in representing complex data structures like hierarchical and tree-like structures, which cannot be embedded efficiently in Euclidean Spaces. Our proposed algorithm replaces the Euclidean Similarity Matrix with an appropriate Hyperbolic Similarity Matrix, demonstrating improved efficiency compared to clustering in Euclidean Spaces. Our contributions include the development of the spectral clustering algorithm on Hyperbolic Spaces and the proof of its weak consistency. We show that our algorithm converges at least as fast as Spectral Clustering on Euclidean Spaces. To illustrate the efficacy of our approach, we present experimental results on the Wisconsin Breast Cancer Dataset, highlighting the superior performance of Hyperbolic Spectral Clustering over its Euclidean counterpart. This work opens up avenues for utilizing non-Euclidean Spaces in clustering algorithms, offering new perspectives for handling complex data structures and improving clustering efficiency.
翻译:聚类作为一种无监督技术,在各种数据分析应用中发挥着关键作用。在聚类算法中,欧几里得空间上的谱聚类已得到广泛研究。然而,随着数据复杂性的快速演变,欧几里得空间在表示和学习算法方面被证明效率不足。尽管双曲空间上的深度神经网络最近受到关注,但非欧几里得空间上的聚类算法或非深度机器学习模型仍未得到充分探索。在本文中,我们提出了一种双曲空间上的谱聚类算法以填补这一空白。双曲空间在表示复杂数据结构(如层次结构和树状结构)方面具有优势,这些结构无法在欧几里得空间中高效嵌入。我们提出的算法用适当的双曲相似矩阵替代欧几里得相似矩阵,证明了其相较于欧几里得空间聚类具有更高的效率。我们的贡献包括开发了双曲空间上的谱聚类算法并证明了其弱一致性。我们表明,该算法的收敛速度至少与欧几里得空间上的谱聚类相当。为说明我们方法的有效性,我们在威斯康星乳腺癌数据集上展示了实验结果,突显了双曲谱聚类相较于其欧几里得对应方法的优越性能。这项工作为在聚类算法中利用非欧几里得空间开辟了途径,为处理复杂数据结构和提高聚类效率提供了新的视角。