In this paper, we investigate three fundamental problems in the Massively Parallel Computation (MPC) model: (i) grid graph connectivity, (ii) approximate Euclidean Minimum Spanning Tree (EMST), and (iii) approximate DBSCAN. Our first result is a $O(1)$-round Las Vegas (i.e., succeeding with high probability) MPC algorithm for computing the connected components on a $d$-dimensional $c$-penetration grid graph ($(d,c)$-grid graph), where both $d$ and $c$ are positive integer constants. In such a grid graph, each vertex is a point with integer coordinates in $\mathbb{N}^d$, and an edge can only exist between two distinct vertices with $\ell_\infty$-norm at most $c$. To our knowledge, the current best existing result for computing the connected components (CC's) on $(d,c)$-grid graphs in the MPC model is to run the state-of-the-art MPC CC algorithms that are designed for general graphs: they achieve $O(\log \log n + \log D)$[FOCS19] and $O(\log \log n + \log \frac{1}{\lambda})$[PODC19] rounds, respectively, where $D$ is the {\em diameter} and $\lambda$ is the {\em spectral gap} of the graph. With our grid graph connectivity technique, our second main result is a $O(1)$-round Las Vegas MPC algorithm for computing approximate Euclidean MST. The existing state-of-the-art result on this problem is the $O(1)$-round MPC algorithm proposed by Andoni et al.[STOC14], which only guarantees an approximation on the overall weight in expectation. In contrast, our algorithm not only guarantees a deterministic overall weight approximation, but also achieves a deterministic edge-wise weight approximation.The latter property is crucial to many applications, such as finding the Bichromatic Closest Pair and DBSCAN clustering. Last but not the least, our third main result is a $O(1)$-round Las Vegas MPC algorithm for computing an approximate DBSCAN clustering in $O(1)$-dimensional space.
翻译:本文研究大规模并行计算(MPC)模型中的三个基本问题:(i) 网格图连通性,(ii) 近似欧几里得最小生成树(EMST),以及(iii) 近似DBSCAN。我们的第一个成果是一个$O(1)$轮的拉斯维加斯(即以高概率成功)MPC算法,用于计算$d$维$c$穿透网格图($(d,c)$-网格图)的连通分量,其中$d$和$c$均为正常整数常数。在此类网格图中,每个顶点是$\mathbb{N}^d$中具有整数坐标的点,且仅当两个不同顶点间的$\ell_\infty$范数不超过$c$时,它们之间才可能存在边。据我们所知,当前在MPC模型中计算$(d,c)$-网格图连通分量(CC)的最佳现有结果是运行专为一般图设计的最先进MPC连通分量算法:它们分别需要$O(\log \log n + \log D)$[FOCS19]轮和$O(\log \log n + \log \frac{1}{\lambda})$[PODC19]轮,其中$D$是图的{\em直径},$\lambda$是图的{\em谱间隙}。利用我们的网格图连通性技术,我们的第二个主要成果是一个$O(1)$轮的拉斯维加斯MPC算法,用于计算近似欧几里得最小生成树。该问题现有的最先进结果是由Andoni等人[STOC14]提出的$O(1)$轮MPC算法,该算法仅能保证对整体权重的期望近似。相比之下,我们的算法不仅保证了整体权重的确定性近似,还实现了边权重的确定性近似。后一特性对于许多应用至关重要,例如寻找双色最近点对和DBSCAN聚类。最后但同样重要的是,我们的第三个主要成果是一个$O(1)$轮的拉斯维加斯MPC算法,用于在$O(1)$维空间中计算近似的DBSCAN聚类。