Understanding the Cluster LP for Correlation Clustering

In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla~(FOCS 2002), the input is a complete graph where edges are labeled either $+$ or $-$, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev~(STOC 2015) gave a 2.06-approximation by providing a near-optimal rounding of the standard LP, and Cohen-Addad, Lee, Li, and Newman~(FOCS 2022, 2023) finally bypassed the integrality gap of 2 for this LP giving a $1.73$-approximation for the problem. In order to create a simple and unified framework for Correlation Clustering similar to those for {\em typical} approximate optimization tasks, we propose the {\em cluster LP} as a strong linear program that might tightly capture the approximability of Correlation Clustering. It unifies all the previous relaxations for the problem. We demonstrate the power of the cluster LP by presenting a simple rounding algorithm, and providing two analyses, one analytically proving a 1.49-approximation and the other solving a factor-revealing SDP to show a 1.437-approximation. Both proofs introduce principled methods by which to analyze the performance of the algorithm, resulting in a significantly improved approximation guarantee. Finally, we prove an integrality gap of $4/3$ for the cluster LP, showing our 1.437-upper bound cannot be drastically improved. Our gap instance directly inspires an improved NP-hardness of approximation with a ratio $24/23 \approx 1.042$; no explicit hardness ratio was known before.

翻译：在Bansal、Blum和Chawla（FOCS 2002）引入的经典相关性聚类问题中，输入是一个完全图，其中每条边被标记为$+$或$-$，目标是找到一个顶点划分，使得跨部分的+边之和与部分内的-边之和最小化。近年来，Chawla、Makarychev、Schramm和Yaroslavtsev（STOC 2015）通过提供标准线性规划的近优舍入方法给出了2.06近似比，而Cohen-Addad、Lee、Li和Newman（FOCS 2022, 2023）最终突破了该线性规划的2-整数性间隙，给出了该问题的$1.73$-近似比。为了为相关性聚类建立一个类似于{\em典型}近似优化任务的简单统一框架，我们提出了{\em簇线性规划}，这是一个强大的线性规划，可能能够紧密刻画相关性聚类的可近似性。它统一了该问题的所有先前松弛方法。我们通过提出一个简单的舍入算法，并给出两种分析来展示簇线性规划的威力：一种分析通过解析方法证明了1.49-近似比，另一种通过求解因子揭示半定规划得到了1.437-近似比。两种证明都引入了分析算法性能的原理性方法，从而显著改进了近似保证。最后，我们证明了簇线性规划具有$4/3$的整数性间隙，显示我们的1.437上界无法大幅改进。我们的间隙实例直接启发了一个改进的NP难近似比$24/23 \approx 1.042$；此前未见显式难度比。