Correlation Clustering (CC) is a foundational problem in unsupervised learning that models binary similarity relations using labeled graphs. While classical CC has been widely studied, many real-world applications involve more nuanced relationships, either multi-class categorical interactions or varying confidence levels in edge labels. To address these, two natural generalizations have been proposed: Chromatic Correlation Clustering (CCC), which assigns semantic colors to edge labels, and pseudometric-weighted CC, which allows edge weights satisfying the triangle inequality. In this paper, we develop improved approximation algorithms for both settings. Our approach leverages LP-based pivoting techniques combined with problem-specific rounding functions. For the pseudometric-weighted correlation clustering problem, we present a tight $10/3$-approximation algorithm, matching the best possible bound achievable within the framework of standard LP relaxation combined with specialized rounding. For the Chromatic Correlation Clustering (CCC) problem, we improve the approximation ratio from the previous best of $2.5$ to $2.15$, and we establish a lower bound of $2.11$ within the same analytical framework, highlighting the near-optimality of our result.
翻译:相关聚类(CC)是无监督学习中的一个基础问题,它通过标记图对二元相似性关系进行建模。虽然经典CC已被广泛研究,但许多实际应用涉及更复杂的关系,包括多类别分类交互或边标签的不同置信度。为解决这些问题,研究者提出了两种自然的推广形式:色度相关聚类(CCC)——为边标签赋予语义颜色,以及伪度量加权CC——允许边权重满足三角不等式。本文针对这两种设定提出了改进的近似算法。我们的方法结合了基于线性规划的枢轴技术与针对特定问题的舍入函数。对于伪度量加权相关聚类问题,我们提出了紧致的10/3近似算法,该结果与标准线性规划松弛结合专用舍入方法所能达到的最佳界限相匹配。对于色度相关聚类(CCC)问题,我们将近似比从先前最佳的2.5提升至2.15,并在相同分析框架下建立了2.11的下界,这凸显了我们结果的近似最优性。