We introduce a new method for clustering based on Cluster Catch Digraphs (CCDs). The new method addresses the limitations of RK-CCDs by employing a new variant of spatial randomness test that employs the nearest neighbor distance (NND) instead of the Ripley's K function used by RK-CCDs. We conduct a comprehensive Monte Carlo analysis to assess the performance of our method, considering factors such as dimensionality, data set size, number of clusters, cluster volumes, and inter-cluster distance. Our method is particularly effective for high-dimensional data sets, comparable to or outperforming KS-CCDs and RK-CCDs that rely on a KS-type statistic or the Ripley's K function. We also evaluate our methods using real and complex data sets, comparing them to well-known clustering methods. Again, our methods exhibit competitive performance, producing high-quality clusters with desirable properties. Keywords: Graph-based clustering, Cluster catch digraphs, High-dimensional data, The nearest neighbor distance, Spatial randomness test
翻译:本文提出了一种基于聚类捕获有向图(CCDs)的新聚类方法。该方法通过采用一种新型空间随机性检验来解决RK-CCDs的局限性,该检验使用最近邻距离(NND)替代了RK-CCDs中使用的Ripley's K函数。我们进行了全面的蒙特卡洛分析以评估该方法的性能,考虑了维度、数据集大小、聚类数量、聚类体积和聚类间距离等因素。该方法尤其适用于高维数据集,其性能与依赖KS型统计量或Ripley's K函数的KS-CCDs和RK-CCDs相当或更优。我们还使用真实和复杂数据集评估了该方法,并与知名聚类方法进行了比较。结果表明,该方法展现出具有竞争力的性能,能够生成具有理想特性的高质量聚类。关键词:基于图的聚类,聚类捕获有向图,高维数据,最近邻距离,空间随机性检验