Distributed Algorithms for Large-Scale Graphs

Motivated by the increasing need for fast processing of large-scale graphs, we study a number of fundamental graph problems in a message-passing model for distributed computing, called $k$-machine model, where we have $k$ machines that jointly perform computations on $n$-node graphs. The graph is assumed to be partitioned in a balanced fashion among the $k$ machines, a common implementation in many real-world systems. Communication is point-to-point via bandwidth-constrained links, and the goal is to minimize the round complexity, i.e., the number of communication rounds required to finish a computation. We present a generic methodology that allows to obtain efficient algorithms in the $k$-machine model using distributed algorithms for the classical CONGEST model of distributed computing. Using this methodology, we obtain algorithms for various fundamental graph problems such as connectivity, minimum spanning trees, shortest paths, maximal independent sets, and finding subgraphs, showing that many of these problems can be solved in $\tilde{O}(n/k)$ rounds; this shows that one can achieve speedup nearly linear in $k$. To complement our upper bounds, we present lower bounds on the round complexity that quantify the fundamental limitations of solving graph problems distributively. We first show a lower bound of $\Omega(n/k)$ rounds for computing a spanning tree of the input graph. This result implies the same bound for other fundamental problems such as computing a minimum spanning tree, breadth-first tree, or shortest paths tree. We also show a $\tilde \Omega(n/k^2)$ lower bound for connectivity, spanning tree verification and other related problems. The latter lower bounds follow from the development and application of novel results in a random-partition variant of the classical communication complexity model.

翻译：受大规模图快速处理需求的日益增长驱动，我们在一种称为$k$机模型的分布式计算消息传递模型中研究若干基本图问题。该模型包含$k台机器协同处理n$节点图，图数据采用平衡划分方式分布至各机器——这是实际系统常见的实现方案。机器间通过带宽受限链路进行点对点通信，优化目标是最小化轮数复杂度（即完成计算所需的通信轮数）。我们提出一种通用方法论，可基于经典分布式计算CONGEST模型的分布式算法，在$k$机模型中构建高效算法。通过该方法，我们获得连通性、最小生成树、最短路径、最大独立集及子图查找等基本图问题的求解算法，证明多数问题可在$\tilde{O}(n/k)$轮内解决，这表明可实现近乎线性于$k$的加速效果。为补充上界，我们给出量化分布式图计算根本局限性的轮数复杂度下界。首先证明输入图生成树计算的下界为$\Omega(n/k)$轮，该结果可推至最小生成树、广度优先树或最短路径树等基础问题。进一步证得连通性、生成树验证及相关问题的$\tilde \Omega(n/k^2)$下界，后者源于经典通信复杂度模型随机划分变体中新型结果的开发与应用。