有界入度有向图中的PageRank中心性计算 (PageRank Centrality in Directed Graphs with Bounded In-Degree)

from arxiv, full version of a SODA 2026 paper, 25 pages, 2 figures; v2: revised discussion of contributions and related work, added explanations on the exponent gamma

We study the computational complexity of locally estimating a node's PageRank centrality in a directed graph $G$. For any node $t$, its PageRank centrality $π(t)$ is defined as the probability that a random walk in $G$, starting from a uniformly chosen node, terminates at $t$, where each step terminates with a constant probability $α\in(0,1)$. To obtain a multiplicative $\big(1\pm O(1)\big)$-approximation of $π(t)$ with probability $Ω(1)$, the previously best upper bound is $O(n^{1/2}\min\{ Δ_{in}^{1/2},Δ_{out}^{1/2},m^{1/4}\})$ from [Wang, Wei, Wen, Yang, STOC '24], where $n$ and $m$ denote the number of nodes and edges in $G$, and $Δ_{in}$ and $Δ_{out}$ upper bound the in-degrees and out-degrees of $G$, respectively. Using a refinement of the proof in the same paper, we establish a lower bound of $Ω(n^{1/2}\min\{Δ_{in}^{1/2}/n^γ,Δ_{out}^{1/2}/n^γ,m^{1/4}\})$, where $γ=\frac{1}{2}(2\max\{\log_{1/(1-α)}Δ_{in},1\}-1)^{-1}$. As $γ$ only depends on $Δ_{in}$ and $n^γ=O(1)$ for $Δ_{in}=Ω\left(n^{Ω(1)}\right)$, the known upper bound is tight if we only parameterize the complexity by $n$, $m$, and $Δ_{out}$. However, there remains a gap of $Ω(n^γ)$ when considering $Δ_{in}$, and this gap is large when $Δ_{in}$ is small. In the extreme case where $Δ_{in}\le1/(1-α)$, we have $γ=1/2$, leading to a gap of $Ω(n^{1/2})$ between the bounds $O(n^{1/2})$ and $Ω(1)$. In this paper, we present a new algorithm that achieves the above lower bound (up to logarithmic factors). The algorithm assumes that $n$ and the bounds $Δ_{in}$ and $Δ_{out}$ are known in advance. Our key technique is a novel randomized backwards propagation process that only propagates selectively based on Monte Carlo estimated PageRank scores.

翻译：我们研究了在有向图$G$中局部估计节点PageRank中心性的计算复杂度。对于任意节点$t$，其PageRank中心性$π(t)$定义为：从均匀选择的节点出发，在$G$中进行随机游走，最终终止于$t$的概率，其中每一步以常数概率$α∈(0,1)$终止。为以$Ω(1)$概率获得$π(t)$的乘法$(1±O(1))$近似，先前最佳上界为$O(n^{1/2}\min\{Δ_{in}^{1/2},Δ_{out}^{1/2},m^{1/4}\})$（参见[Wang, Wei, Wen, Yang, STOC '24]），其中$n$和$m$分别表示$G$中节点和边的数量，$Δ_{in}$和$Δ_{out}$分别表示$G$的入度和出度上界。通过对同一论文证明的精细化改进，我们建立了$Ω(n^{1/2}\min\{Δ_{in}^{1/2}/n^γ,Δ_{out}^{1/2}/n^γ,m^{1/4}\})$的下界，其中$γ=\frac{1}{2}(2\max\{\log_{1/(1-α)}Δ_{in},1\}-1)^{-1}$。由于$γ$仅依赖于$Δ_{in}$，且当$Δ_{in}=Ω(n^{Ω(1)})$时$n^γ=O(1)$，若仅以$n$、$m$和$Δ_{out}$为参数，则已知上界是紧的。然而，当考虑$Δ_{in}$时仍存在$Ω(n^γ)$的间隙，且该间隙在$Δ_{in}$较小时尤为显著。在极端情况$Δ_{in}≤1/(1-α)$下，$γ=1/2$将导致$O(n^{1/2})$与$Ω(1)$界限间出现$Ω(n^{1/2})$的间隙。本文提出一种新算法，可在对数因子内达到上述下界。该算法假设$n$及边界$Δ_{in}$、$Δ_{out}$已知。我们的核心技术是一种新颖的随机反向传播过程，该过程基于蒙特卡洛估计的PageRank分数进行选择性传播。

相关内容

PageRank

关注 210

PageRank，网页排名，又称网页级别、Google左侧排名或佩奇排名，是一种由[1] 根据网页之间相互的超链接计算的技术，而作为网页排名的要素之一，以Google公司创办人拉里·佩奇（Larry Page）之姓来命名。Google用它来体现网页的相关性和重要性，在搜索引擎优化操作中是经常被用来评估网页优化的成效因素之一。Google的创始人拉里·佩奇和谢尔盖·布林于1998年在斯坦福大学发明了这项技术。

《多层网络PageRank算法在国防关键基础设施分析中的应用》最新报告

专知会员服务

15+阅读 · 2025年6月22日

《红外点源目标的高精度中心估计》20页报告，美国陆军研究实验室

专知会员服务

31+阅读 · 2023年6月13日

【NeurIPS22】大图上线性复杂度的节点级Transformer

专知会员服务

21+阅读 · 2022年11月29日

如何从数学角度理解知识图谱嵌入？中山大学等最新《知识图谱嵌入:表征空间视角》研究综述，32页pdf阐述代数、几何、分析下的KGE

专知会员服务

48+阅读 · 2022年11月8日