Graph clustering is a fundamental task in network analysis where the goal is to detect sets of nodes that are well-connected to each other but sparsely connected to the rest of the graph. We present faster approximation algorithms for an NP-hard parameterized clustering framework called LambdaCC, which is governed by a tunable resolution parameter and generalizes many other clustering objectives such as modularity, sparsest cut, and cluster deletion. Previous LambdaCC algorithms are either heuristics with no approximation guarantees, or computationally expensive approximation algorithms. We provide fast new approximation algorithms that can be made purely combinatorial. These rely on a new parameterized edge labeling problem we introduce that generalizes previous edge labeling problems that are based on the principle of strong triadic closure and are of independent interest in social network analysis. Our methods are orders of magnitude more scalable than previous approximation algorithms and our lower bounds allow us to obtain a posteriori approximation guarantees for previous heuristics that have no approximation guarantees of their own.
翻译:图聚类是网络分析中的一项基础任务,其目标是检测彼此紧密连接而与图其余部分稀疏连接的节点集合。我们针对一个名为LambdaCC的NP难参数化聚类框架提出了更快的近似算法。该框架由可调分辨率参数控制,并推广了模块度、最稀疏割和簇删除等众多聚类目标。以往的LambdaCC算法要么是缺乏近似保证的启发式算法,要么是计算成本高昂的近似算法。我们提供了新的快速近似算法,这些算法可以完全基于组合方法实现。这些算法依赖于我们引入的一个新的参数化边标注问题,该问题推广了基于强三角闭包原理的现有边标注问题,并在社会网络分析中具有独立的研究价值。我们的方法在可扩展性上比以往的近似算法高出多个数量级,并且我们的下界能够为以往自身缺乏近似保证的启发式算法提供后验近似保证。