Maximal Biclique Enumeration (MBE) holds critical importance in graph theory with applications extending across fields such as bioinformatics, social networks, and recommendation systems. However, its computational complexity presents barriers for efficiently scaling to large graphs. To address these challenges, we introduce cuMBE, a GPU-optimized parallel algorithm for MBE. Utilizing a unique data structure, called compact array, cuMBE eradicates the need for recursion, thereby significantly minimizing dynamic memory requirements and computational overhead. The algorithm utilizes a hybrid parallelism approach, in which GPU thread blocks handle coarse-grained tasks associated with part of the search process. Besides, we implement three fine-grained optimizations within each thread block to enhance performance. Further, we integrate a work-stealing mechanism to mitigate workload imbalances among thread blocks. Our experiments reveal that cuMBE achieves an geometric mean speedup of 4.02x and 4.13x compared to the state-of-the-art serial algorithm and parallel CPU-based algorithm on both common and real-world datasets, respectively.
翻译:最大二分团枚举(MBE)在图论中具有关键重要性,其应用涵盖生物信息学、社交网络和推荐系统等领域。然而,其计算复杂性对高效扩展到大规模图构成了障碍。为应对这些挑战,我们提出了cuMBE——一种面向GPU优化的MBE并行算法。通过利用一种称为紧凑数组的独特数据结构,cuMBE消除了对递归的需求,从而显著降低了动态内存需求和计算开销。该算法采用混合并行方法,其中GPU线程块处理与搜索过程部分相关的粗粒度任务。此外,我们在每个线程块内实现了三种细粒度优化以提升性能。进一步地,我们集成了工作窃取机制来缓解线程块之间的负载不均衡。实验表明,在通用数据集和真实数据集上,与最先进的串行算法和基于CPU的并行算法相比,cuMBE分别实现了4.02倍和4.13倍的几何平均加速比。