DBSCAN, a well-known density-based clustering algorithm, has gained widespread popularity and usage due to its effectiveness in identifying clusters of arbitrary shapes and handling noisy data. However, it encounters challenges in producing satisfactory cluster results when confronted with datasets of varying density scales, a common scenario in real-world applications. In this paper, we propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN. First, we model the initial dataset as a two-level encoding tree and categorize the data vertices into distinct density partitions according to the information uncertainty determined in the encoding tree. Each partition is then assigned to an agent to find the best clustering parameters without manual assistance. The allocation is density-adaptive, enabling AR-DBSCAN to effectively handle diverse density distributions within the dataset by utilizing distinct agents for different partitions. Second, a multi-agent deep reinforcement learning guided automatic parameter searching process is designed. The process of adjusting the parameter search direction by perceiving the clustering environment is modeled as a Markov decision process. Using a weakly-supervised reward training policy network, each agent adaptively learns the optimal clustering parameters by interacting with the clusters. Third, a recursive search mechanism adaptable to the data's scale is presented, enabling efficient and controlled exploration of large parameter spaces. Extensive experiments are conducted on nine artificial datasets and a real-world dataset. The results of offline and online tasks show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
翻译:DBSCAN作为一种经典的基于密度的聚类算法,因其能够有效识别任意形状的簇并处理噪声数据而得到广泛应用。然而,在面对现实应用中常见的多尺度密度分布数据集时,该算法难以获得理想的聚类结果。本文提出一种基于多智能体强化学习的自适应鲁棒DBSCAN聚类框架(AR-DBSCAN)。首先,将初始数据集建模为双层编码树,根据编码树确定的信息不确定性将数据节点划分为不同的密度分区。每个分区被分配给独立的智能体,在无需人工干预的情况下自主寻找最优聚类参数。这种分配机制具有密度自适应性,使得AR-DBSCAN能够通过不同智能体处理不同分区,有效应对数据集内部多样化的密度分布。其次,设计了多智能体深度强化学习引导的自动参数搜索流程。将通过感知聚类环境调整参数搜索方向的过程建模为马尔可夫决策过程。采用弱监督奖励训练策略网络,使每个智能体通过与聚类环境的交互自适应学习最优聚类参数。再次,提出了适应数据规模的递归搜索机制,实现对大规模参数空间的高效可控探索。在九个人工数据集和一个真实数据集上进行了广泛实验。离线与在线任务的结果表明,AR-DBSCAN在NMI和ARI指标上分别将聚类精度最高提升了144.1%和175.3%,同时能够鲁棒地发现主导参数。