Estimating Random-Walk Probabilities in Directed Graphs

from arxiv, v4: Providing an $O(m\log n)$ upper bound for estimating $π(s,t)$ regardless of how small $π(s,t)$ is (i.e., addressing the case where the relative error threshold $δ= 0$). The previous upper bound was $O(m\log{1/δ})$

We study discounted random walks in directed graphs. In each step, the walk either terminates with a constant probability $α$, or proceeds to a random out-neighbor. Our goal is to estimate the probability $π(s, t)$ that a discounted random walk starting from $s$ terminates at $t$. This probability is also known as the Personalized PageRank (PPR) score, which measures the relevance of $t$ to $s$, for instance, when $s$ and $t$ are web pages on the Internet. We aim to estimate $π(s, t)$ within a constant relative error with constant probability. A variety of algorithms have been developed for several problem variants, such as single-pair, single-source, single-target, and single-node estimation, under both worst-case and average-case settings, and for different combinations of allowed graph queries. However, in many important cases, there remain polynomial gaps between known upper and lower bounds. In this paper, we establish tight upper and lower bounds (up to logarithmic factors of $n$) for all problem variants and query combinations, closing all existing gaps in both the worst-case and average-case settings. Below we give some examples for the worst-case settings. As an upper-bound example, the classic power method estimates $π(s,t)$ if it is above a threshold $δ$ in time $O(m\log(1/δ))$ but $π(s,t)$ can be as small as $1/n^{Θ(n)}$. For contrast, we propose algorithms that deterministically estimate arbitrarily small $π(s,t)$ in $O(m\log n)$ time. As a lower-bound example, we improve the lower bound for the single-pair problem from $Ω(\min\{n,1/δ\})$ to $Ω(\min\{m,1/δ\})$, which is optimal (up to logarithmic factors) since a simple Monte Carlo estimate takes $O(1/δ)$ time.

翻译：我们研究有向图中的折扣随机游走：每一步中，游走以恒定概率$α$终止，或以概率转移到随机出邻居。目标在于估计从起点$s$出发的折扣随机游走终止于$t$的概率$π(s, t)$。该概率也被称为个性化PageRank（PPR）得分，用于衡量$t$对$s$的相关性，例如当$s$和$t$为互联网上的网页时。我们旨在以恒定相对误差和恒定概率估计$π(s, t)$。针对多种问题变体（如单点对、单源、单目标及单节点估计），研究人员已在最坏情形与平均情形设定下，结合不同允许的图查询组合，开发了一系列算法。然而，在诸多重要情形中，已知上下界之间仍存在多项式差距。本文针对所有问题变体与查询组合，建立了紧致的上下界（精确至$n$的对数因子），弥合了最坏情形与平均情形设定下的所有现有差距。以下给出最坏情形设定的部分示例。上界示例：经典幂方法可在$O(m\log(1/δ))$时间内估计高于阈值$δ$的$π(s, t)$，但$π(s, t)$可低至$1/n^{Θ(n)}$。作为对比，我们提出的算法能以$O(m\log n)$时间确定性地估计任意小的$π(s, t)$。下界示例：我们将单点对问题的下界从$Ω(\min\{n,1/δ\})$改进至$Ω(\min\{m,1/δ\})$，由于简单蒙特卡洛估计需$O(1/δ)$时间，该下界（精确至对数因子）是最优的。