The challenges of graph stream algorithms are twofold. First, each edge needs to be processed only once, and second, it needs to work on highly constrained memory. Diffusion degree is a measure of node centrality that can be calculated (for all nodes) trivially for static graphs using a single Breadth-First Search (BFS). However, keeping track of the Diffusion Degree in a graph stream is nontrivial. The memory requirement for exact calculation is equivalent to keeping the whole graph in memory. The present paper proposes an estimator (or sketch) of diffusion degree for graph streams. We prove the correctness of the proposed sketch and the upper bound of the estimated error. Given $\epsilon, \delta \in (0,1)$, we achieve error below $\epsilon(b_u-a_u)d_u\lambda$ in node $u$ with probability $1-\delta$ by utilizing $O(n\frac1{\epsilon^2}\log{\frac1{\delta}})$ space, where $b_u$ and $a_u$ are the maximum and minimum degrees of neighbors of $u$, $\lambda$ is diffusion probability, and $d_u$ is the degree of node $u$. With the help of this sketch, we propose an algorithm to extract the top-$k$ influencing nodes in the graph stream. Comparative experiments show that the spread of top-$k$ nodes by the proposed graph stream algorithm is equivalent to or better than the spread of top-$k$ nodes extracted by the exact algorithm.
翻译:图流算法的挑战具有双重性:首先,每条边仅需处理一次;其次,算法需在高度受限的内存条件下运行。扩散度是一种节点中心性度量,对于静态图,可通过单次广度优先搜索简单计算所有节点的扩散度。然而,在流式图中追踪扩散度并非易事,因为精确计算所需的内存等同于将整个图保存在内存中。本文提出了一种针对图流扩散度的估计器(或草图)。我们证明了所提草图的正确性及估计误差的上界。给定 $\epsilon, \delta \in (0,1)$,通过利用 $O(n\frac1{\epsilon^2}\log{\frac1{\delta}})$ 的空间复杂度,我们以 $1-\delta$ 的概率将节点 $u$ 的误差控制在 $\epsilon(b_u-a_u)d_u\lambda$ 以内,其中 $b_u$ 和 $a_u$ 分别为节点 $u$ 邻居的最大度和最小度,$\lambda$ 为扩散概率,$d_u$ 为节点 $u$ 的度数。借助此草图,我们提出了一种从图流中提取 top-$k$ 影响节点的算法。对比实验表明,通过所提图流算法提取的 top-$k$ 节点的传播范围与通过精确算法提取的 top-$k$ 节点相当甚至更优。