PROPAGATE: a seed propagation framework to compute Distance-based metrics on Very Large Graphs

We propose PROPAGATE, a fast approximation framework to estimate distance-based metrics on very large graphs such as the (effective) diameter, the (effective) radius, or the average distance within a small error. The framework assigns seeds to nodes and propagates them in a BFS-like fashion, computing the neighbors set until we obtain either the whole vertex set (the diameter) or a given percentage (the effective diameter). At each iteration, we derive compressed Boolean representations of the neighborhood sets discovered so far. The PROPAGATE framework yields two algorithms: PROPAGATE-P, which propagates all the $s$ seeds in parallel, and PROPAGATE-s which propagates the seeds sequentially. For each node, the compressed representation of the PROPAGATE-P algorithm requires $s$ bits while that of PROPAGATE-S only $1$ bit. Both algorithms compute the average distance, the effective diameter, the diameter, and the connectivity rate within a small error with high probability: for any $\varepsilon>0$ and using $s=\Theta\left(\frac{\log n}{\varepsilon^2}\right)$ sample nodes, the error for the average distance is bounded by $\xi = \frac{\varepsilon \Delta}{\alpha}$, the error for the effective diameter and the diameter are bounded by $\xi = \frac{\varepsilon}{\alpha}$, and the error for the connectivity rate is bounded by $\varepsilon$ where $\Delta$ is the diameter and $\alpha$ is a measure of connectivity of the graph. The time complexity is $\mathcal{O}\left(m\Delta \frac{\log n}{\varepsilon^2}\right)$, where $m$ is the number of edges of the graph. The experimental results show that the PROPAGATE framework improves the current state of the art both in accuracy and speed. Moreover, we experimentally show that PROPAGATE-S is also very efficient for solving the All Pair Shortest Path problem in very large graphs.

翻译：我们提出PROPAGATE，一种快速近似框架，用于在超大规模图上以较小误差估计基于距离的度量，例如（有效）直径、（有效）半径或平均距离。该框架为节点分配种子，并以类似广度优先搜索（BFS）的方式传播种子，计算邻居集直到获得完整顶点集（直径）或给定百分比（有效直径）。每次迭代中，我们推导出迄今为止发现的邻域集的压缩布尔表示。PROPAGATE框架产生两种算法：PROPAGATE-P，并行传播所有$s$个种子；以及PROPAGATE-S，顺序传播种子。对于每个节点，PROPAGATE-P算法的压缩表示需要$s$比特，而PROPAGATE-S仅需1比特。两种算法均能以高概率在较小误差内计算平均距离、有效直径、直径和连通率：对于任意$\varepsilon>0$，并使用$s=\Theta\left(\frac{\log n}{\varepsilon^2}\right)$个样本节点，平均距离误差有界为$\xi = \frac{\varepsilon \Delta}{\alpha}$，有效直径和直径误差有界为$\xi = \frac{\varepsilon}{\alpha}$，连通率误差有界为$\varepsilon$，其中$\Delta$是直径，$\alpha$是图的连通性度量。时间复杂度为$\mathcal{O}\left(m\Delta \frac{\log n}{\varepsilon^2}\right)$，其中$m$是图的边数。实验结果表明，PROPAGATE框架在精度和速度上均改进了当前最先进方法。此外，我们通过实验证明，PROPAGATE-S在解决超大规模图上的全对最短路径问题（All Pair Shortest Path）中也非常高效。