We propose PROPAGATE, a fast approximation framework to estimate distance-based metrics on very large graphs such as the (effective) diameter, the (effective) radius, or the average distance within a small error. The framework assigns seeds to nodes and propagates them in a BFS-like fashion, computing the neighbors set until we obtain either the whole vertex set (the diameter) or a given percentage (the effective diameter). At each iteration, we derive compressed Boolean representations of the neighborhood sets discovered so far. The PROPAGATE framework yields two algorithms: PROPAGATE-P, which propagates all the $s$ seeds in parallel, and PROPAGATE-s which propagates the seeds sequentially. For each node, the compressed representation of the PROPAGATE-P algorithm requires $s$ bits while that of PROPAGATE-S only $1$ bit. Both algorithms compute the average distance, the effective diameter, the diameter, and the connectivity rate within a small error with high probability: for any $\varepsilon>0$ and using $s=\Theta\left(\frac{\log n}{\varepsilon^2}\right)$ sample nodes, the error for the average distance is bounded by $\xi = \frac{\varepsilon \Delta}{\alpha}$, the error for the effective diameter and the diameter are bounded by $\xi = \frac{\varepsilon}{\alpha}$, and the error for the connectivity rate is bounded by $\varepsilon$ where $\Delta$ is the diameter and $\alpha$ is a measure of connectivity of the graph. The time complexity is $\mathcal{O}\left(m\Delta \frac{\log n}{\varepsilon^2}\right)$, where $m$ is the number of edges of the graph. The experimental results show that the PROPAGATE framework improves the current state of the art both in accuracy and speed. Moreover, we experimentally show that PROPAGATE-S is also very efficient for solving the All Pair Shortest Path problem in very large graphs.
翻译:我们提出PROPAGATE,一个快速近似计算框架,用于在极高规模图上以较小误差估计基于距离的度量指标,如(有效)直径、(有效)半径或平均距离。该框架为节点分配种子,并以类BFS方式传播种子,计算邻居集合直至获得完整顶点集(直径)或指定百分比(有效直径)。每次迭代中,我们推导已发现邻域集合的压缩布尔表示。PROPAGATE框架产生两种算法:PROPAGATE-P并行传播所有$s$个种子,而PROPAGATE-S顺序传播种子。对每个节点,PROPAGATE-P算法的压缩表示需要$s$比特,而PROPAGATE-S仅需1比特。两种算法均能以高概率在较小误差内计算平均距离、有效直径、直径及连通率:对任意$\varepsilon>0$,使用$s=\Theta\left(\frac{\log n}{\varepsilon^2}\right)$个样本节点时,平均距离误差上限为$\xi = \frac{\varepsilon \Delta}{\alpha}$,有效直径与直径误差上限为$\xi = \frac{\varepsilon}{\alpha}$,连通率误差上限为$\varepsilon$,其中$\Delta$为直径,$\alpha$为图的连通性度量。时间复杂度为$\mathcal{O}\left(m\Delta \frac{\log n}{\varepsilon^2}\right)$,其中$m$为图的边数。实验结果表明,PROPAGATE框架在精度和速度上均优于当前最优方法。此外,我们通过实验证明PROPAGATE-S在求解高大规模图的全对最短路径问题时同样高效。