Efficient $(α,β)$-core Computation and On-the-fly Query at Billion Scale with GPUs

In bipartite graphs, $(α,β)$-core is a widely used model for cohesive subgraph mining. Specifically, an $(α,β)$-core is a maximal subgraph in which each vertex in the upper layer has degree at least $α$, and each vertex in the lower layer has degree at least $β$. The state-of-the-art CPU-based solutions incur extensive costs to construct an index structure for all $α$ and $β$ combinations, leading to scalability challenges on large bipartite graphs. Moreover, on-the-fly queries, which aim to determine whether an edge update belongs to a target $(α,β)$-core, are essential for real-time applications such as fraud monitoring and recommendation systems. However, existing index-based methods struggle to support such queries at scale due to their high maintenance overhead. In this paper, we investigate how to leverage GPU architectures to enable efficient $(α,β)$-core computation and support on-the-fly queries. While GPUs are widely used to accelerate graph processing, their limited memory capacity makes it impractical to store large index structures. To address this issue, we propose GCC, an index-free GPU-based peeling algorithm that accelerates $(α,β)$-core computation via warp-centric processing. To further improve efficiency, we develop GCC+, which leverages the nested property of $(α,β)$-core with a core-based early pruning strategy. For handling on-the-fly queries, we propose GFQ, a connectivity-aware algorithm that significantly narrows the computation scope by leveraging connected component information, thereby avoiding full-graph peeling. Extensive experiments on 11 datasets demonstrate that our proposed techniques outperform existing CPU-based solutions in terms of both space and time efficiency.

翻译：在二分图中，$(α,β)$-核是凝聚子图挖掘的广泛使用模型。具体而言，$(α,β)$-核是一个最大子图，其中上层每个顶点的度数至少为$α$，下层每个顶点的度数至少为$β$。当前基于CPU的最优解决方案需要为所有$α$和$β$组合构建索引结构，导致在大规模二分图上面临可扩展性挑战。此外，旨在判断边更新是否属于目标$(α,β)$-核的即时查询，对于欺诈监控和推荐系统等实时应用至关重要。然而，现有基于索引的方法因维护开销过高而难以支持此类大规模查询。本文研究如何利用GPU架构实现高效的$(α,β)$-核计算并支持即时查询。尽管GPU被广泛用于加速图处理，但其有限的内存容量使得存储大规模索引结构不切实际。针对此问题，我们提出无索引的GPU剥离算法GCC，通过以warp为中心的处理器加速$(α,β)$-核计算。为进一步提升效率，我们开发了GCC+，利用$(α,β)$-核的嵌套特性并结合基于核的早期剪枝策略。针对即时查询，我们提出连通性感知算法GFQ，通过利用连通分量信息显著缩小计算范围，从而避免全图剥离。在11个数据集上的大量实验表明，所提出的技术在空间和时间效率上均优于现有基于CPU的解决方案。