Summarizing graphs w.r.t. structural features is important to reduce the graph's size and make tasks like indexing, querying, and visualization feasible. Our generic parallel BRS algorithm efficiently summarizes large graphs w.r.t. a custom equivalence relation $\sim$ defined on the graph's vertices $V$. Moreover, the definition of $\sim$ can be chained $k\geq 1$ times, so the defined equivalence relation becomes a $k$-bisimulation. We evaluate the runtime and memory performance of the BRS algorithm for $k$-bisimulation with $k=1,\ldots,10$ against two algorithms found in the literature (a sequential algorithm due to Kaushik et al. and a parallel algorithm of Sch\"atzle et al.), which we implemented in the same software stack as BRS. We use five real-world and synthetic graph datasets containing 100 million to two billion edges. Our results show that the generic BRS algorithm outperforms the respective native bisimulation algorithms on all datasets for all $k\geq5$ and for smaller $k$ in some cases. The BRS implementations of the two bisimulation algorithms run almost as fast as each other. Thus, the BRS algorithm is an effective parallelization of the sequential Kaushik et al. bisimulation algorithm.
翻译:基于结构特征对图进行摘要化处理对于降低图的规模、实现索引、查询和可视化等任务至关重要。我们提出的通用并行BRS算法能够针对定义在图顶点集$V$上的自定义等价关系$\sim$,高效地对大规模图进行摘要化。此外,该等价关系$\sim$可进行$k\geq 1$次链式扩展,从而形成$k$-双模拟等价关系。我们针对$k=1,\ldots,10$的$k$-双模拟场景,将BRS算法的运行时间和内存性能与文献中的两种算法(Kaushik等人的串行算法和Schätzle等人的并行算法)进行了对比评估,这两种算法均在同一软件栈中实现。实验采用五个包含一亿至二十亿条边的真实与合成图数据集。结果表明,对于所有数据集,当$k\geq5$时,通用BRS算法在性能上全面超越针对性的原生双模拟算法;在部分$k值较小的情况下,BRS算法同样具有优势。两种双模拟算法的BRS实现运行速度几乎相当。因此,BRS算法是对Kaushik等人串行双模拟算法的有效并行化实现。