Stochastic block partitioning (SBP) is a community detection algorithm that is highly accurate even on graphs with a complex community structure, but its inherently serial nature hinders its widespread adoption by the wider scientific community. To make it practical to analyze large real-world graphs with SBP, there is a growing need to parallelize and distribute the algorithm. The current state-of-the-art distributed SBP algorithm is a divide-and-conquer approach that limits communication between compute nodes until the end of inference. This leads to the breaking of computational dependencies, which causes convergence issues as the number of compute nodes increases, and when the graph is sufficiently sparse. In this paper, we introduce EDiSt - an exact distributed stochastic block partitioning algorithm. Under EDiSt, compute nodes periodically share community assignments during inference. Due to this additional communication, EDiSt improves upon the divide-and-conquer algorithm by allowing it to scale out to a larger number of compute nodes without suffering from convergence issues, even on sparse graphs. We show that EDiSt provides speedups of up to 23.8X over the divide-and-conquer approach, and speedups up to 38.0X over shared memory parallel SBP when scaled out to 64 compute nodes.
翻译:随机块划分(SBP)是一种社区检测算法,即使在具有复杂社区结构的图上也能保持高精度,但其固有的串行特性阻碍了其在更广泛科学界的普及。为了实用化地通过SBP分析大规模真实世界图,并行化和分布式处理该算法的需求日益增长。当前最先进的分布式SBP算法采用分治策略,在推理结束前限制计算节点间的通信。这会导致计算依赖关系的破坏,从而随着计算节点数量的增加以及图稀疏程度的加剧产生收敛问题。本文提出了EDiSt——一种精确分布式随机块划分算法。在EDiSt下,计算节点在推理过程中定期共享社区分配结果。由于这种额外的通信机制,EDiSt改进了分治算法,使其能够扩展到更多计算节点而不会遭遇收敛问题,即使在稀疏图上也是如此。我们证明,当扩展到64个计算节点时,EDiSt相比分治方法可实现最高23.8倍的加速,相比共享内存并行SBP可实现最高38.0倍的加速。