In this work, we introduce a nonparametric clustering stopping rule algorithm based on the spatial median. Our proposed method aims to achieve the balance between the homogeneity within the clusters and the heterogeneity between clusters. The proposed algorithm maximises the ratio of the variation between clusters and the variation within clusters while adjusting for the number of clusters and number of observations. The proposed algorithm is robust against distributional assumptions and the presence of outliers. Simulations have been used to validate the algorithm. We further evaluated the stability and the efficacy of the proposed algorithm using three real-world datasets. Moreover, we compared the performance of our model with 13 other traditional algorithms for determining the number of clusters. We found that the proposed algorithm outperformed 11 of the algorithms considered for comparison in terms of clustering number determination. The finding demonstrates that the proposed method provides a reliable alternative to determine the number of clusters for multivariate data.
翻译:本研究提出了一种基于空间中位数的非参数聚类停止准则算法。该方法旨在实现类内同质性与类间异质性之间的平衡。所提算法通过最大化类间变异与类内变异的比值,同时调整聚类数目与观测值数量。该算法对分布假设和异常值的存在具有鲁棒性。我们通过仿真实验验证了算法的有效性,并进一步使用三个真实数据集评估了所提算法的稳定性与效能。此外,我们将本模型与13种其他传统聚类数目确定算法进行了性能比较。结果表明,在聚类数目确定方面,所提算法优于参与比较的11种算法。这一发现证明,该方法为多元数据聚类数目的确定提供了一种可靠的替代方案。