Accelerating Biclique Counting on GPU

Counting (p,q)-bicliques in bipartite graphs poses a foundational challenge with broad applications, from densest subgraph discovery in algorithmic research to personalized content recommendation in practical scenarios. Despite its significance, current leading (p,q)-biclique counting algorithms fall short, particularly when faced with larger graph sizes and clique scales. Fortunately, the problem's inherent structure, allowing for the independent counting of each biclique starting from every vertex, combined with a substantial set intersections, makes it highly amenable to parallelization. Recent successes in GPU-accelerated algorithms across various domains motivate our exploration into harnessing the parallelism power of GPUs to efficiently address the (p,q)-biclique counting challenge. We introduce GBC (GPU-based Biclique Counting), a novel approach designed to enable efficient and scalable (p,q)-biclique counting on GPUs. To address major bottleneck arising from redundant comparisons in set intersections (occupying an average of 90% of the runtime), we introduce a novel data structure that hashes adjacency lists into truncated bitmaps to enable efficient set intersection on GPUs via bit-wise AND operations. Our innovative hybrid DFS-BFS exploration strategy further enhances thread utilization and effectively manages memory constraints. A composite load balancing strategy, integrating pre-runtime and runtime workload allocation, ensures equitable distribution among threads. Additionally, we employ vertex reordering and graph partitioning strategies for improved compactness and scalability. Experimental evaluations on eight real-life and two synthetic datasets demonstrate that GBC outperforms state-of-the-art algorithms by a substantial margin. In particular, GBC achieves an average speedup of 497.8x, with the largest instance achieving a remarkable 1217.7x speedup when p = q = 8.

翻译：在二部图中计数(p,q)-二部团是一项基础性挑战，具有广泛的应用场景，从算法研究中的最密集子图发现，到实际场景中的个性化内容推荐。尽管重要性显著，当前领先的(p,q)-二部团计数算法仍存在不足，尤其在处理更大规模图和团尺度时表现欠佳。幸运的是，该问题固有的结构特性——允许从每个顶点独立计数每个二部团，并伴随大量集合交集运算——使其高度适合并行化。近期GPU加速算法在各领域的成功经验，激励我们探索利用GPU的并行计算能力高效解决(p,q)-二部团计数难题。我们提出GBC（基于GPU的二部团计数），这是一种旨在GPU上实现高效可扩展(p,q)-二部团计数的新方法。为解决集合交集中冗余比较（平均占运行时90%）这一主要瓶颈，我们引入一种新型数据结构，将邻接列表哈希为截断位图，通过按位与运算在GPU上实现高效集合交集。创新的混合DFS-BFS探索策略进一步提升了线程利用率，并有效管理内存约束。结合运行时前与运行时工作负载分配的复合负载均衡策略，确保了线程间的公平任务分配。此外，我们采用顶点重排序和图分区策略以提升紧凑性与可扩展性。在八个真实数据集和两个合成数据集上的实验评估表明，GBC显著优于现有最优算法。特别地，当p = q = 8时，GBC实现平均497.8倍加速，最大实例达到惊人的1217.7倍加速。