Clustering algorithms play a pivotal role in unsupervised learning by identifying and grouping similar objects based on shared characteristics. Although traditional clustering techniques, such as hard and fuzzy center-based clustering, have been widely used, they struggle with complex, high-dimensional, and non-Euclidean datasets. In particular, the fuzzy $C$-Means (FCM) algorithm, despite its efficiency and popularity, exhibits notable limitations in non-Euclidean spaces. Euclidean spaces assume linear separability and uniform distance scaling, limiting their effectiveness in capturing complex, hierarchical, or non-Euclidean structures in fuzzy clustering. To overcome these challenges, we introduce Filtration-based Hyperbolic Fuzzy C-Means (HypeFCM), a novel clustering algorithm tailored for better representation of data relationships in non-Euclidean spaces. HypeFCM integrates the principles of fuzzy clustering with hyperbolic geometry and employs a weight-based filtering mechanism to improve performance. The algorithm initializes weights using a Dirichlet distribution and iteratively refines cluster centroids and membership assignments based on a hyperbolic metric in the Poincar\'e Disc model. Extensive experimental evaluations on $6$ synthetic and $12$ real-world datasets demonstrate that HypeFCM significantly outperforms conventional fuzzy clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.
翻译:聚类算法在无监督学习中扮演着关键角色,其通过识别具有共同特征的相似对象并进行分组。尽管传统的聚类技术(如基于硬中心和模糊中心的聚类方法)已得到广泛应用,但在处理复杂、高维和非欧几里得数据集时仍面临挑战。具体而言,模糊C均值算法虽具有高效性和广泛适用性,但在非欧几里得空间中表现出显著局限性。欧几里得空间假设线性可分性和均匀距离尺度,这限制了其在模糊聚类中捕捉复杂、层次化或非欧几里得结构的能力。为克服这些挑战,本文提出基于滤波的双曲模糊C均值算法,这是一种专为更好地表示非欧几里得空间中数据关系而设计的新型聚类算法。该算法将模糊聚类原理与双曲几何相结合,并采用基于权重的滤波机制以提升性能。算法通过狄利克雷分布初始化权重,并基于庞加莱圆盘模型中的双曲度量迭代优化聚类中心和隶属度分配。在6个合成数据集和12个真实数据集上的大量实验评估表明,HypeFCM在非欧几里得环境下显著优于传统模糊聚类方法,充分证明了其鲁棒性和有效性。