Multi-core & GPU-based Balanced Butterfly Counting in Signed Bipartite Graphs

Balanced butterfly counting, corresponding to counting balanced (2, 2)-bicliques, is a fundamental primitive in the analysis of signed bipartite graphs and provides a basis for studying higher-order structural properties such as clustering coefficients and community structure. Although prior work has proposed an efficient CPU-based serial method for counting balanced (2, k)-bicliques. The computational cost of balanced butterfly counting remains a major bottleneck on large-scale graphs. In this work, we present the highly parallel implementations for balanced butterfly counting for both multicore CPUs and GPUs. The proposed multi-core algorithm (M-BBC) employs fine-grained vertex-level parallelism to accelerate wedge-based counting while eliminating the generation of unbalanced substructures. To improve scalability, we develop a GPU-based method (G-BBC) that uses a tile-based parallel approach to effectively leverage shared memory while handling large vertex sets. We then present an improved variation, G-BBC++, which integrates dynamic scheduling to mitigate workload imbalance and maximize throughput. We conduct an experimental assessment of the proposed methods across 15 real-world datasets. Experimental results exhibit that M-BBC achieves speedups of up to 71.13x (average 38.13x) over the sequential baseline BB2K. The GPU-based algorithms deliver even greater improvements, achieving up to 13,320x speedup (average 2,600x) over BB2K and outperforming M-BBC by up to 186x (average 50x). These results indicate the substantial scalability and efficiency of our parallel algorithms and establish a robust foundation for high-performance signed motif analysis on massive bipartite graphs.

翻译：平衡蝴蝶计数（对应平衡(2,2)-双团计数）是符号二分图分析中的基本原语，为研究聚类系数和社区结构等高阶结构特性提供了基础。尽管先前研究已提出一种基于CPU的高效串行方法来计数平衡(2,k)-双团，但平衡蝴蝶计算的计算成本在大规模图上仍是主要瓶颈。本文提出了面向多核CPU和GPU的高度并行化平衡蝴蝶计数实现方案。所提出的多核算法（M-BBC）采用细粒度的顶点级并行策略，在避免生成不平衡子结构的同时加速基于楔形的计数过程。为提升可扩展性，我们开发了基于GPU的方法（G-BBC），该方法采用基于分块的并行策略，在有效利用共享内存的同时处理大规模顶点集。随后我们提出改进版本G-BBC++，该版本集成动态调度机制以缓解工作负载不均衡问题并最大化吞吐量。我们在15个真实数据集上对所提方法进行了实验评估。实验结果表明，M-BBC相较于串行基线BB2K最高可实现71.13倍（平均38.13倍）加速。基于GPU的算法展现出更显著的性能提升，较BB2K最高实现13,320倍（平均2,600倍）加速，并较M-BBC最高提升186倍（平均50倍）。这些结果证明了我们并行算法具有显著的可扩展性和高效性，为海量二分图上的高性能符号模体分析奠定了坚实基础。