Mining cohesive subgraphs from a graph is a fundamental problem in graph data analysis. One notable cohesive structure is $\gamma$-quasi-clique (QC), where each vertex connects at least a fraction $\gamma$ of the other vertices inside. Enumerating maximal $\gamma$-quasi-cliques (MQCs) of a graph has been widely studied. One common practice of finding all MQCs is to (1) find a set of QCs containing all MQCs and then (2) filter out non-maximal QCs. While quite a few algorithms have been developed (which are branch-and-bound algorithms) for finding a set of QCs that contains all MQCs, all focus on sharpening the pruning techniques and devote little effort to improving the branching part. As a result, they provide no guarantee on pruning branches and all have the worst-case time complexity of $O^*(2^n)$, where $O^*$ suppresses the polynomials and $n$ is the number of vertices in the graph. In this paper, we focus on the problem of finding a set of QCs containing all MQCs but deviate from further sharpening the pruning techniques as existing methods do. We pay attention to both the pruning and branching parts and develop new pruning techniques and branching methods that would suit each other better towards pruning more branches both theoretically and practically. Specifically, we develop a new branch-and-bound algorithm called FastQC based on newly developed pruning techniques and branching methods, which improves the worst-case time complexity to $O^*(\alpha_k^n)$, where $\alpha_k$ is a positive real number strictly smaller than 2. Furthermore, we develop a divide-and-conquer strategy for boosting the performance of FastQC. Finally, we conduct extensive experiments on both real and synthetic datasets, and the results show that our algorithms are up to two orders of magnitude faster than the state-of-the-art on real datasets.
翻译:从图中挖掘内聚子图是图数据分析中的基础问题。一个显著的内聚结构是$\gamma$-拟团,其中每个顶点至少与内部其他顶点中占比$\gamma$的顶点相连接。枚举图中的最大$\gamma$-拟团(MQC)已被广泛研究。寻找所有MQC的一种常见做法是:(1) 找到包含所有MQC的一组拟团,然后(2) 滤除非最大拟团。尽管已有不少算法(均为分支定界算法)用于寻找包含所有MQC的拟团集合,但这些算法都专注于强化剪枝技术,而对改进分支部分投入甚少。因此,它们无法保证对分支进行剪枝,且最坏时间复杂度均为$O^*(2^n)$,其中$O^*$忽略多项式因子,$n$为图中顶点数。本文聚焦于寻找包含所有MQC的拟团集合问题,但摒弃了现有方法中进一步强化剪枝技术的做法。我们同时关注剪枝与分支部分,并开发了新的剪枝技术与分支方法,使两者在理论与实践中能更好地协同,以剪除更多分支。具体而言,基于新开发的剪枝技术与分支方法,我们提出了一种名为FastQC的新型分支定界算法,将最坏时间复杂度改进为$O^*(\alpha_k^n)$,其中$\alpha_k$是严格小于2的正实数。此外,我们开发了一种分治策略以提升FastQC的性能。最后,我们在真实与合成数据集上进行了大量实验,结果显示,在真实数据集上,我们的算法比现有最优方法快两个数量级。