Accelerating Biclique Counting on GPU

Counting (p,q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest subgraph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading (p,q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's inherent structure, allowing for the independent counting of each biclique starting from every vertex, combined with a substantial set intersections, makes it highly amenable to parallelization. Recent successes in GPU-accelerated algorithms across various domains motivate our exploration into harnessing the parallelism power of GPUs to efficiently address the (p,q)-biclique counting challenge. We introduce GBC (GPU-based Biclique Counting), a novel approach designed to enable efficient and scalable (p,q)-biclique counting on GPUs. To address major bottleneck arising from redundant comparisons in set intersections (occupying an average of 90% of the runtime), we introduce a novel data structure that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations. Our innovative hybrid DFS-BFS exploration strategy further enhances thread utilization and effectively manages memory constraints. A composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads. Additionally, we employ vertex reordering and graph partitioning strategies for improved compactness and scalability. Experimental evaluations on eight real-life and two synthetic datasets demonstrate that GBC outperforms state-of-the-art algorithms by a substantial margin. In particular, GBC achieves an average speedup of 497.8x, with the largest instance achieving a remarkable 1217.7x speedup when p = q = 8.

翻译：在二分图中计数(p,q)-双团是一项基础性挑战，具有广泛应用，从算法研究中的最密子图发现到实际场景中的个性化内容推荐。尽管其重要性，当前领先的(p,q)-双团计数算法仍存在不足，尤其是在处理更大规模图和团规模时。幸运的是，该问题的固有结构——允许从每个顶点独立计数每个双团，并结合大量集合交集操作——使其高度适合并行化。近年来GPU加速算法在各个领域的成功应用，激励我们探索利用GPU的并行计算能力高效解决(p,q)-双团计数挑战。我们提出GBC（基于GPU的双团计数），一种旨在GPU上实现高效可扩展的(p,q)-双团计数的新方法。为解决集合交集中冗余比较（平均占用90%运行时间）这一主要瓶颈，我们引入了一种新颖的数据结构，将邻接列表哈希化为截断位图，通过按位与运算在GPU上实现高效集合交集。创新的混合DFS-BFS探索策略进一步提升了线程利用率并有效管理内存约束。结合运行时前与运行时工作负载分配的复合负载均衡策略，确保线程间的公平分配。此外，我们采用顶点重排序和图分割策略以提升紧凑性和可扩展性。在八个真实数据集和两个合成数据集上的实验评估表明，GBC以显著优势超越现有最优算法。特别地，当p=q=8时，GBC实现平均497.8倍加速比，最大实例达到惊人的1217.7倍加速比。