In this work, we introduce a nonparametric clustering stopping rule algorithm based on the spatial median. Our proposed method aims to achieve the balance between the homogeneity within the clusters and the heterogeneity between clusters. The proposed algorithm maximises the ratio of the variation between clusters and the variation within clusters while adjusting for the number of clusters and number of observations. The proposed algorithm is robust against distributional assumptions and the presence of outliers. Simulations have been used to validate the algorithm. We further evaluated the stability and the efficacy of the proposed algorithm using three real-world datasets. Moreover, we compared the performance of our model with 13 other traditional algorithms for determining the number of clusters. We found that the proposed algorithm outperformed 11 of the algorithms considered for comparison in terms of clustering number determination. The finding demonstrates that the proposed method provides a reliable alternative to determine the number of clusters for multivariate data.
翻译:本文提出了一种基于空间中位数的非参数聚类停止规则算法。该方法旨在实现类内同质性与类间异质性之间的平衡。所提算法在调整聚类数目与观测样本数量的同时,最大化类间变异与类内变异的比值。该算法对分布假设及异常值的存在具有鲁棒性。我们通过仿真实验验证了算法的有效性,并利用三个真实数据集进一步评估了算法的稳定性与效能。此外,我们将本模型与13种传统聚类数目确定算法进行了性能比较。结果显示,在聚类数目确定任务中,所提算法优于参与比较的11种算法。这一发现表明,该方法为多元数据的聚类数目确定问题提供了可靠的解决方案。