DBSCAN has been widely used in density-based clustering algorithms. However, with the increasing demand for Multi-density clustering, previous traditional DSBCAN can not have good clustering results on Multi-density datasets. In order to address this problem, an adaptive Multi-density DBSCAN algorithm (AMD-DBSCAN) is proposed in this paper. An improved parameter adaptation method is proposed in AMD-DBSCAN to search for multiple parameter pairs (i.e., Eps and MinPts), which are the key parameters to determine the clustering results and performance, therefore allowing the model to be applied to Multi-density datasets. Moreover, only one hyperparameter is required for AMD-DBSCAN to avoid the complicated repetitive initialization operations. Furthermore, the variance of the number of neighbors (VNN) is proposed to measure the difference in density between each cluster. The experimental results show that our AMD-DBSCAN reduces execution time by an average of 75% due to lower algorithm complexity compared with the traditional adaptive algorithm. In addition, AMD-DBSCAN improves accuracy by 24.7% on average over the state-of-the-art design on Multi-density datasets of extremely variable density, while having no performance loss in Single-density scenarios. Our code and datasets are available at https://github.com/AlexandreWANG915/AMD-DBSCAN.
翻译:DBSCAN已被广泛应用于基于密度的聚类算法中。然而,随着对多密度聚类需求的日益增长,传统DBSCAN算法在多密度数据集上难以获得良好的聚类效果。为解决这一问题,本文提出了一种自适应多密度DBSCAN算法(AMD-DBSCAN)。该算法通过改进的参数自适应方法,能够搜索多个参数对(即Eps和MinPts),这些参数对是决定聚类结果与性能的关键参数,从而使得模型能够适用于多密度数据集。此外,AMD-DBSCAN仅需一个超参数,避免了复杂的重复初始化操作。进一步地,我们提出了邻域数量方差(VNN)来衡量各簇之间的密度差异。实验结果表明,与传统自适应算法相比,AMD-DBSCAN因算法复杂度更低,平均执行时间减少75%。同时,在极变密度的多密度数据集上,AMD-DBSCAN的平均准确率相较于现有最优设计提升了24.7%,且在单密度场景下性能无损。我们的代码与数据集已开源至https://github.com/AlexandreWANG915/AMD-DBSCAN。