We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.
翻译:我们提出一种名为CLASSIX的快速可解释聚类方法。该方法包含两个阶段:首先将排序后的数据贪婪聚合为邻近数据点的组群,随后通过组群合并形成聚类。算法由两个标量参数控制:一个是聚合距离参数,另一个控制最小聚类规模。通过在合成数据集和真实数据集上开展大量实验,我们对不同聚类形状及低维到高维特征空间的聚类性能进行了全面评估。实验结果表明,CLASSIX可与当前最先进的聚类算法相媲美。该算法具有线性空间复杂度,并在广泛问题中实现接近线性的时间复杂度。其固有的简洁性使得能够为计算出的聚类结果生成直观的解释。