In this paper, we study parallel algorithms for the correlation clustering problem, where every pair of two different entities is labeled with similar or dissimilar. The goal is to partition the entities into clusters to minimize the number of disagreements with the labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. In comparison with the $1.994+\epsilon$ ratio achieved by polynomial-time sequential algorithms [CLN22], a significant gap exists. We propose the first poly-logarithmic depth parallel algorithm that achieves a better approximation ratio than 3. Specifically, our algorithm computes a $(2.4+\epsilon)$-approximate solution and uses $\tilde{O}(m^{1.5})$ work. Additionally, it can be translated into a $\tilde{O}(m^{1.5})$-time sequential algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with $\tilde{O}(m^{1.5})$ total memory. Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12] length-constrained multi-commodity flow algorithm, where we develop an efficient parallel algorithm to solve a truncated correlation clustering linear program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of the truncated linear program can be rounded with a factor of at most 2.4 loss by using the framework of [CMSY15]. Such a rounding framework can then be implemented using parallel pivot-based approaches.
翻译:在本文中,我们研究相关聚类问题的并行算法,其中每一对不同实体被标记为相似或不相似。目标是划分实体为簇,以最小化与标签不一致的数量。目前,所有高效的并行算法近似比至少为3。与多项式时间顺序算法[CLN22]实现的$1.994+\epsilon$比率相比,存在显著差距。我们提出了首个深度为亚对数的并行算法,该算法实现了优于3的近似比。具体而言,我们的算法计算出一个$(2.4+\epsilon)$-近似解,并使用$\tilde{O}(m^{1.5})$工作量。此外,它可转化为一个$\tilde{O}(m^{1.5})$时间的顺序算法,以及一个使用$\tilde{O}(m^{1.5})$总内存的亚对数轮次亚线性内存MPC算法。我们的方法受Awerbuch、Khandekar和Rao [AKR12]的长度约束多商品流算法启发,其中我们开发了一个高效的并行算法来解决Charikar、Guruswami和Wirth [CGW05]的截断相关聚类线性规划。然后我们证明,使用[CMSY15]的框架,截断线性规划的解可以被舍入且因子损失不超过2.4。这样的舍入框架随后可使用基于枢轴的并行方法实现。