The densest subgraph problem has received significant attention, both in theory and in practice, due to its applications in problems such as community detection, social network analysis, and spam detection. Due to the high cost of obtaining exact solutions, much attention has focused on designing approximate densest subgraph algorithms. However, existing approaches are not able to scale to massive graphs with billions of edges. In this paper, we introduce a new framework that combines approximate densest subgraph algorithms with a pruning optimization. We design new parallel variants of the state-of-the-art sequential Greedy++ algorithm, and plug it into our framework in conjunction with a parallel pruning technique based on $k$-core decomposition to obtain parallel $(1+\varepsilon)$-approximate densest subgraph algorithms. On a single thread, our algorithms achieve $2.6$--$34\times$ speedup over Greedy++, and obtain up to $22.37\times$ self relative parallel speedup on a 30-core machine with two-way hyper-threading. Compared with the state-of-the-art parallel algorithm by Harb et al. [NeurIPS'22], we achieve up to a $114\times$ speedup on the same machine. Finally, against the recent sequential algorithm of Xu et al. [PACMMOD'23], we achieve up to a $25.9\times$ speedup. The scalability of our algorithms enables us to obtain near-optimal density statistics on the hyperlink2012 (with roughly 113 billion edges) and clueweb (with roughly 37 billion edges) graphs for the first time in the literature.
翻译:稠密子图问题因其在社区检测、社交网络分析和垃圾信息检测等应用中的重要性,在理论和实践中都受到了广泛关注。由于获取精确解的高昂成本,大量研究集中于设计近似稠密子图算法。然而,现有方法无法扩展到具有数十亿条边的超大规模图。在本文中,我们提出了一种新框架,将近似稠密子图算法与剪枝优化相结合。我们设计了现有最优顺序算法Greedy++的新型并行变体,并将其嵌入框架,结合基于$k$-核分解的并行剪枝技术,得到并行$(1+\varepsilon)$-近似稠密子图算法。在单线程上,我们的算法相比Greedy++实现了$2.6$至$34$倍的加速比,并在配备双向超线程的30核机器上实现了最高$22.37$倍的自相对并行加速比。与Harb等人[NeurIPS'22]的现有最优并行算法相比,我们在同一台机器上实现了最高$114$倍的加速比。此外,相对于Xu等人[PACMMOD'23]的最新顺序算法,我们实现了最高$25.9$倍的加速比。我们算法的可扩展性使我们首次在文献中能够获得hyperlink2012(约1130亿条边)和clueweb(约370亿条边)图的近最优密度统计量。