This paper proposes a distributed pseudo-likelihood method (DPL) to conveniently identify the community structure of large-scale networks. Specifically, we first propose a block-wise splitting method to divide large-scale network data into several subnetworks and distribute them among multiple workers. For simplicity, we assume the classical stochastic block model. Then, the DPL algorithm is iteratively implemented for the distributed optimization of the sum of the local pseudo-likelihood functions. At each iteration, the worker updates its local community labels and communicates with the master. The master then broadcasts the combined estimator to each worker for the new iterative steps. Based on the distributed system, DPL significantly reduces the computational complexity of the traditional pseudo-likelihood method using a single machine. Furthermore, to ensure statistical accuracy, we theoretically discuss the requirements of the worker sample size. Moreover, we extend the DPL method to estimate degree-corrected stochastic block models. The superior performance of the proposed distributed algorithm is demonstrated through extensive numerical studies and real data analysis.
翻译:本文提出一种分布式伪似然方法(DPL),用于便捷地识别大规模网络的社区结构。具体而言,我们首先提出一种分块分割方法,将大规模网络数据划分为若干子网络并分配给多个工作节点。为简化起见,我们假设经典随机分块模型。随后,通过迭代执行DPL算法,对局部伪似然函数之和进行分布式优化。在每次迭代中,工作节点更新其局部社区标签并与主节点通信。主节点随后将聚合估计量广播至各工作节点,以进行新一轮迭代。基于该分布式系统,DPL显著降低了传统单机伪似然方法的计算复杂度。此外,为保障统计准确性,我们从理论上探讨了工作节点样本量的要求。进一步地,我们将DPL方法扩展至度校正随机分块模型的估计。通过大量数值模拟与真实数据分析,验证了所提分布式算法的优越性能。