We study the computational complexity of locally estimating a node's PageRank centrality in a directed graph $G$. For any node $t$, its PageRank centrality $π(t)$ is defined as the probability that a random walk in $G$, starting from a uniformly chosen node, terminates at $t$, where each step terminates with a constant probability $α\in(0,1)$. To obtain a multiplicative $\big(1\pm O(1)\big)$-approximation of $π(t)$ with probability $Ω(1)$, the previously best upper bound is $O(n^{1/2}\min\{ Δ_{in}^{1/2},Δ_{out}^{1/2},m^{1/4}\})$ from [Wang, Wei, Wen, Yang, STOC '24], where $n$ and $m$ denote the number of nodes and edges in $G$, and $Δ_{in}$ and $Δ_{out}$ upper bound the in-degrees and out-degrees of $G$, respectively. Using a refinement of the proof in the same paper, we establish a lower bound of $Ω(n^{1/2}\min\{Δ_{in}^{1/2}/n^γ,Δ_{out}^{1/2}/n^γ,m^{1/4}\})$, where $γ=\frac{1}{2}(2\max\{\log_{1/(1-α)}Δ_{in},1\}-1)^{-1}$. As $γ$ only depends on $Δ_{in}$ and $n^γ=O(1)$ for $Δ_{in}=Ω\left(n^{Ω(1)}\right)$, the known upper bound is tight if we only parameterize the complexity by $n$, $m$, and $Δ_{out}$. However, there remains a gap of $Ω(n^γ)$ when considering $Δ_{in}$, and this gap is large when $Δ_{in}$ is small. In the extreme case where $Δ_{in}\le1/(1-α)$, we have $γ=1/2$, leading to a gap of $Ω(n^{1/2})$ between the bounds $O(n^{1/2})$ and $Ω(1)$. In this paper, we present a new algorithm that achieves the above lower bound (up to logarithmic factors). The algorithm assumes that $n$ and the bounds $Δ_{in}$ and $Δ_{out}$ are known in advance. Our key technique is a novel randomized backwards propagation process that only propagates selectively based on Monte Carlo estimated PageRank scores.
翻译:我们研究了在有向图$G$中局部估计节点PageRank中心性的计算复杂度。对于任意节点$t$,其PageRank中心性$π(t)$定义为:从均匀选择的节点出发,在$G$中进行随机游走,最终终止于$t$的概率,其中每一步以常数概率$α∈(0,1)$终止。为以$Ω(1)$概率获得$π(t)$的乘法$(1±O(1))$近似,先前最佳上界为$O(n^{1/2}\min\{Δ_{in}^{1/2},Δ_{out}^{1/2},m^{1/4}\})$(参见[Wang, Wei, Wen, Yang, STOC '24]),其中$n$和$m$分别表示$G$中节点和边的数量,$Δ_{in}$和$Δ_{out}$分别表示$G$的入度和出度上界。通过对同一论文证明的精细化改进,我们建立了$Ω(n^{1/2}\min\{Δ_{in}^{1/2}/n^γ,Δ_{out}^{1/2}/n^γ,m^{1/4}\})$的下界,其中$γ=\frac{1}{2}(2\max\{\log_{1/(1-α)}Δ_{in},1\}-1)^{-1}$。由于$γ$仅依赖于$Δ_{in}$,且当$Δ_{in}=Ω(n^{Ω(1)})$时$n^γ=O(1)$,若仅以$n$、$m$和$Δ_{out}$为参数,则已知上界是紧的。然而,当考虑$Δ_{in}$时仍存在$Ω(n^γ)$的间隙,且该间隙在$Δ_{in}$较小时尤为显著。在极端情况$Δ_{in}≤1/(1-α)$下,$γ=1/2$将导致$O(n^{1/2})$与$Ω(1)$界限间出现$Ω(n^{1/2})$的间隙。本文提出一种新算法,可在对数因子内达到上述下界。该算法假设$n$及边界$Δ_{in}$、$Δ_{out}$已知。我们的核心技术是一种新颖的随机反向传播过程,该过程基于蒙特卡洛估计的PageRank分数进行选择性传播。