Clustering algorithms play a pivotal role in unsupervised learning by identifying and grouping similar objects based on shared characteristics. While traditional clustering techniques, such as hard and fuzzy center-based clustering, have been widely used, they struggle with complex, high-dimensional, and non-Euclidean datasets. In particular, the Fuzzy $C$-Means (FCM) algorithm, despite its efficiency and popularity, exhibits notable limitations in non-Euclidean spaces. Euclidean spaces assume linear separability and uniform distance scaling, limiting their effectiveness in capturing complex, hierarchical, or non-Euclidean structures in fuzzy clustering. To overcome these challenges, we introduce Filtration-based Hyperbolic Fuzzy $C$-Means (HypeFCM), a novel clustering algorithm tailored for better representation of data relationships in non-Euclidean spaces. HypeFCM integrates the principles of fuzzy clustering with hyperbolic geometry and employs a weight-based filtering mechanism to improve performance. The algorithm initializes weights using a Dirichlet distribution and iteratively refines cluster centroids and membership assignments based on a hyperbolic metric in the Poincar\'e Disc model. Extensive experimental evaluations demonstrate that HypeFCM significantly outperforms conventional fuzzy clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.
翻译:聚类算法在无监督学习中扮演着关键角色,通过识别具有共同特征的相似对象并将其分组。尽管传统的聚类技术(如硬聚类和基于中心的模糊聚类)已被广泛应用,但它们在处理复杂、高维和非欧几里得数据集时仍面临挑战。特别是模糊C均值算法,尽管其高效且广受欢迎,但在非欧几里得空间中表现出明显的局限性。欧几里得空间假设线性可分性和均匀的距离缩放,这限制了其在模糊聚类中捕捉复杂、层次化或非欧几里得结构的能力。为克服这些挑战,我们提出了基于滤波的双曲模糊C均值算法,这是一种专为更好地表示非欧几里得空间中数据关系而设计的新型聚类算法。HypeFCM将模糊聚类原理与双曲几何相结合,并采用基于权重的滤波机制以提升性能。该算法使用狄利克雷分布初始化权重,并基于庞加莱圆盘模型中的双曲度量迭代优化聚类中心和隶属度分配。大量实验评估表明,在非欧几里得环境下,HypeFCM显著优于传统的模糊聚类方法,凸显了其鲁棒性和有效性。