Probabilistic breadth-first traversals (BPTs) are used in many network science and graph machine learning applications. In this paper, we are motivated by the application of BPTs in stochastic diffusion-based graph problems such as influence maximization. These applications heavily rely on BPTs to implement a Monte-Carlo sampling step for their approximations. Given the large sampling complexity, stochasticity of the diffusion process, and the inherent irregularity in real-world graph topologies, efficiently parallelizing these BPTs remains significantly challenging. In this paper, we present a new algorithm to fuse massive number of concurrently executing BPTs with random starts on the input graph. Our algorithm is designed to fuse BPTs by combining separate traversals into a unified frontier on distributed multi-GPU systems. To show the general applicability of the fused BPT technique, we have incorporated it into two state-of-the-art influence maximization parallel implementations (gIM and Ripples). Our experiments on up to 4K nodes of the OLCF Frontier supercomputer ($32,768$ GPUs and $196$K CPU cores) show strong scaling behavior, and that fused BPTs can improve the performance of these implementations up to 34$\times$ (for gIM) and ~360$\times$ (for Ripples).
翻译:概率广度优先遍历(BPT)广泛应用于网络科学与图机器学习应用。本文受BPT在随机扩散图问题(如影响力最大化)中的应用驱动。此类应用严重依赖BPT实现蒙特卡洛采样步骤以进行近似计算。鉴于大规模采样复杂度、扩散过程的随机性以及真实世界图拓扑固有的不规则性,高效并行化这些BPT仍面临重大挑战。本文提出一种新算法,用于在输入图上融合大量并发执行的随机起点BPT。该算法专为分布式多GPU系统设计,通过将独立遍历合并为统一前沿来实现BPT融合。为展示融合BPT技术的广泛适用性,我们将其集成到两种最先进的影响力最大化并行实现(gIM和Ripples)中。在OLCF Frontier超级计算机(含32,768个GPU和196,000个CPU核心)上进行的多达4K节点的实验表明,融合BPT表现出强扩展性,可将这些实现的性能提升至原版的34倍(gIM)和约360倍(Ripples)。