Faster Parallel Exact Density Peaks Clustering

Clustering multidimensional points is a fundamental data mining task, with applications in many fields, such as astronomy, neuroscience, bioinformatics, and computer vision. The goal of clustering algorithms is to group similar objects together. Density-based clustering is a clustering approach that defines clusters as dense regions of points. It has the advantage of being able to detect clusters of arbitrary shapes, rendering it useful in many applications. In this paper, we propose fast parallel algorithms for Density Peaks Clustering (DPC), a popular version of density-based clustering. Existing exact DPC algorithms suffer from low parallelism both in theory and in practice, which limits their application to large-scale data sets. Our most performant algorithm, which is based on priority search kd-trees, achieves $O(\log n\log\log n)$ span (parallel time complexity) for a data set of $n$ points. Our algorithm is also work-efficient, achieving a work complexity matching the best existing sequential exact DPC algorithm. In addition, we present another DPC algorithm based on a Fenwick tree that makes fewer assumptions for its average-case complexity to hold. We provide optimized implementations of our algorithms and evaluate their performance via extensive experiments. On a 30-core machine with two-way hyperthreading, we find that our best algorithm achieves a 10.8--13169x speedup over the previous best parallel exact DPC algorithm. Compared to the state-of-the-art parallel approximate DPC algorithm, our best algorithm achieves a 1.5--4206x speedup, while being exact.

翻译：多维点聚类是一项基础的数据挖掘任务，在天文学、神经科学、生物信息学和计算机视觉等多个领域具有广泛应用。聚类算法的目标是将相似的对象分组。基于密度的聚类是一种将簇定义为高密度点区域的聚类方法，其优势在于能够检测任意形状的簇，从而在许多应用中具有实用性。本文针对密度峰值聚类（DPC）这种流行的基于密度的聚类变体，提出了一种快速并行算法。现有的精确DPC算法在理论上和实践中均存在并行度低的问题，这限制了其在大规模数据集上的应用。我们性能最优的算法基于优先搜索kd树，对于包含n个点的数据集，实现了O(log n log log n)的跨度（并行时间复杂度）。该算法也具有工作高效性，其工作复杂度与现有最佳串行精确DPC算法相匹配。此外，我们还提出了另一种基于Fenwick树的DPC算法，该算法在平均情况复杂度成立时所需假设更少。我们提供了所提算法的优化实现，并通过大量实验评估了其性能。在一台支持双向超线程的30核机器上，我们发现最优算法相比先前最佳的并行精确DPC算法实现了10.8至13169倍的加速。与最先进的并行近似DPC算法相比，我们的最优算法在保持精确性的同时，实现了1.5至4206倍的加速。