Personalized PageRank (PPR) is a traditional measure for node proximity on large graphs. For a pair of nodes $s$ and $t$, the PPR value $\pi_s(t)$ equals the probability that an $\alpha$-discounted random walk from $s$ terminates at $t$ and reflects the importance between $s$ and $t$ in a bidirectional way. As a generalization of Google's celebrated PageRank centrality, PPR has been extensively studied and has found multifaceted applications in many fields, such as network analysis, graph mining, and graph machine learning. Despite numerous studies devoted to PPR over the decades, efficient computation of PPR remains a challenging problem, and there is a dearth of systematic summaries and comparisons of existing algorithms. In this paper, we recap several frequently used techniques for PPR computation and conduct a comprehensive survey of various recent PPR algorithms from an algorithmic perspective. We classify these approaches based on the types of queries they address and review their methodologies and contributions. We also discuss some representative algorithms for computing PPR on dynamic graphs and in parallel or distributed environments.
翻译:个性化PageRank(PPR)是衡量大规模图上节点邻近度的传统指标。对于节点对 $s$ 和 $t$,PPR值 $\pi_s(t)$ 等于从 $s$ 出发的 $\alpha$-折扣随机游走终止于 $t$ 的概率,并以双向方式反映 $s$ 与 $t$ 之间的重要性。作为谷歌著名PageRank中心性指标的泛化,PPR已被广泛研究,并在网络分析、图挖掘和图机器学习等多个领域展现出多方面的应用价值。尽管数十年来已有大量研究致力于PPR算法,但其高效计算仍是一个具有挑战性的问题,且现有算法的系统性总结与对比尚显不足。本文回顾了PPR计算中几种常用技术,并从算法视角对近年来的各类PPR算法进行全面综述。我们根据这些方法所处理的查询类型对其进行分类,并评述其方法论与贡献。此外,本文还讨论了在动态图以及并行或分布式环境下计算PPR的若干代表性算法。