Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjacent to all but at most $k$ vertices. In this paper, we propose a fast branch-and-bound algorithm as well as its task-based parallel version to enumerate all maximal $k$-plexes with at least $q$ vertices. Our algorithm adopts an effective search space partitioning approach that provides a lower time complexity, a new pivot vertex selection method that reduces candidate vertex size, an effective upper-bounding technique to prune useless branches, and three novel pruning techniques by vertex pairs. Our parallel algorithm uses a timeout mechanism to eliminate straggler tasks, and maximizes cache locality while ensuring load balancing. Extensive experiments show that compared with the state-of-the-art algorithms, our sequential and parallel algorithms enumerate large maximal $k$-plexes with up to $5 \times$ and $18.9 \times$ speedup, respectively. Ablation results also demonstrate that our pruning techniques bring up to $7 \times$ speedup compared with our basic algorithm.
翻译:在大图中寻找凝聚子图具有许多重要应用,如社区检测和生物网络分析。由于数据噪声等原因,社区或生物模块很少形成完全子图,因此团结构往往过于严格。为此,引入$k$-plex作为一种流行的团松弛概念,即图中每个顶点最多与$k$个顶点不相邻。本文提出一种快速分支定界算法及其基于任务的并行版本,用于枚举所有包含至少$q$个顶点的极大$k$-plex子图。我们的算法采用有效的搜索空间划分方法以降低时间复杂度,提出新的枢轴顶点选择策略以缩减候选顶点规模,利用高效的上界剪枝技术去除无效分支,并创新性地引入三种基于顶点对的剪枝技术。并行算法采用超时机制消除掉队任务,在保证负载均衡的同时最大化缓存局部性。大量实验表明,与现有最优算法相比,我们的串行和并行算法分别实现高达5倍和18.9倍的速度提升。消融实验结果也证明,与基础算法相比,我们的剪枝技术可带来高达7倍的加速效果。