We propose a fast and dynamic algorithm for Density-Based Spatial Clustering of Applications with Noise (DBSCAN) that efficiently supports online updates. Traditional DBSCAN algorithms, designed for batch processing, become computationally expensive when applied to dynamic datasets, particularly in large-scale applications where data continuously evolves. To address this challenge, our algorithm leverages the Euler Tour Trees data structure, enabling dynamic clustering updates without the need to reprocess the entire dataset. This approach preserves a near-optimal accuracy in density estimation, as achieved by the state-of-the-art static DBSCAN method (Esfandiari et al., 2021) Our method achieves an improved time complexity of $O(d \log^3(n) + \log^4(n))$ for every data point insertion and deletion, where $n$ and $d$ denote the total number of updates and the data dimension, respectively. Empirical studies also demonstrate significant speedups over conventional DBSCANs in real-time clustering of dynamic datasets, while maintaining comparable or superior clustering quality.
翻译:本文提出一种快速、动态的基于密度的噪声空间聚类应用(DBSCAN)算法,可高效支持在线更新。传统DBSCAN算法专为批处理设计,在处理动态数据集时计算开销巨大,尤其在数据持续演化的大规模应用中更为突出。为应对这一挑战,本算法利用欧拉游历树数据结构,无需重新处理整个数据集即可实现动态聚类更新。该方法在密度估计方面保持了接近最优的精度,与当前最先进的静态DBSCAN方法(Esfandiari等人,2021年)所达到的性能相当。本方法在每次数据点插入和删除操作中实现了$O(d \log^3(n) + \log^4(n))$的时间复杂度改进,其中$n$和$d$分别表示更新总次数和数据维度。实证研究进一步表明,在动态数据集的实时聚类任务中,本算法较传统DBSCAN方法实现了显著加速,同时保持相当或更优的聚类质量。