Cluster analysis, or clustering, plays a crucial role across numerous scientific and engineering domains. Despite the wealth of clustering methods proposed over the past decades, each method is typically designed for specific scenarios and presents certain limitations in practical applications. In this paper, we propose depth-based local center clustering (DLCC). This novel method makes use of data depth, which is known to produce a center-outward ordering of sample points in a multivariate space. However, data depth typically fails to capture the multimodal characteristics of {data}, something of the utmost importance in the context of clustering. To overcome this, DLCC makes use of a local version of data depth that is based on subsets of {data}. From this, local centers can be identified as well as clusters of varying shapes. Furthermore, we propose a new internal metric based on density-based clustering to evaluate clustering performance on {non-convex clusters}. Overall, DLCC is a flexible clustering approach that seems to overcome some limitations of traditional clustering methods, thereby enhancing data analysis capabilities across a wide range of application scenarios.
翻译:聚类分析(或称聚类)在众多科学与工程领域中发挥着关键作用。尽管过去几十年提出了丰富的聚类方法,但每种方法通常针对特定场景设计,在实际应用中存在一定局限性。本文提出基于深度的局部中心聚类(DLCC)。这一新颖方法利用了数据深度,该技术能在多元空间中对样本点产生由中心向外的排序。然而,数据深度通常难以捕捉数据的多模态特征——这在聚类场景中至关重要。为克服此问题,DLCC采用基于数据子集的局部数据深度版本。由此可识别局部中心以及不同形状的聚类簇。此外,我们提出一种基于密度聚类的新型内部评估指标,用于评估非凸聚类簇上的聚类性能。总体而言,DLCC是一种灵活的聚类方法,能够克服传统聚类方法的某些局限,从而提升广泛应用场景下的数据分析能力。