Clustering techniques have been the key drivers of data mining, machine learning and pattern recognition for decades. One of the most popular clustering algorithms is DBSCAN due to its high accuracy and noise tolerance. Many superior algorithms such as DBSCAN have input parameters that are hard to estimate. Therefore, finding those parameters is a time consuming process. In this paper, we propose a novel clustering algorithm Bacteria-Farm, which balances the performance and ease of finding the optimal parameters for clustering. Bacteria- Farm algorithm is inspired by the growth of bacteria in closed experimental farms - their ability to consume food and grow - which closely represents the ideal cluster growth desired in clustering algorithms. In addition, the algorithm features a modular design to allow the creation of versions of the algorithm for specific tasks / distributions of data. In contrast with other clustering algorithms, our algorithm also has a provision to specify the amount of noise to be excluded during clustering.
翻译:聚类技术数十年来一直是数据挖掘、机器学习和模式识别的关键驱动力。其中最流行的聚类算法之一是DBSCAN,因其高精度和噪声容忍性而备受青睐。许多诸如DBSCAN的优越算法存在输入参数难以估计的问题,因此寻找这些参数是一个耗时的过程。本文提出一种新型聚类算法Bacteria-Farm,它在性能与最优聚类参数易于寻找之间取得了平衡。Bacteria-Farm算法受封闭实验农场中细菌生长的启发——它们消耗食物并生长繁殖的能力——这恰好代表了聚类算法中理想的簇增长过程。此外,该算法采用模块化设计,允许针对特定任务/数据分布创建算法的变体版本。与其他聚类算法相比,我们的算法还具备一项特性:能够指定聚类过程中需要排除的噪声量。