Initially used to rank web pages, PageRank has now been applied in many fields. In general case, there are plenty of special vertices such as dangling vertices and unreferenced vertices in the graph. Existing PageRank algorithms usually consider them as `bad` vertices since they may take troubles. However, in this paper, we propose a parallel PageRank algorithm which can take advantage of these special vertices. For this end, we firstly interpret PageRank from the information transmitting perspective and give a constructive definition of PageRank. Then, based on the information transmitting interpretation, a parallel PageRank algorithm which we call the Information Transmitting Algorithm(ITA) is proposed. We prove that the dangling vertices can increase ITA's convergence rate and the unreferenced vertices and weak unreferenced vertices can decrease ITA's calculations. Compared with the MONTE CARLO method, ITA has lower bandwidth requirement. Compared with the power method, ITA has higher convergence rate and generates less calculations. Finally, experimental results on four data sets demonstrate that ITA is 1.5-4 times faster than the power method and converges more uniformly.
翻译:最初用于网页排名,PageRank现已应用于众多领域。通常情况下,图中存在大量特殊顶点,例如悬挂顶点和未引用顶点。现有PageRank算法通常将这些顶点视为"不良"顶点,因为它们可能带来问题。然而,本文提出了一种能够利用这些特殊顶点的并行PageRank算法。为此,我们首先从信息传递的角度解读PageRank,并给出PageRank的结构性定义。然后,基于信息传递的诠释,提出了一种称为信息传递算法(ITA)的并行PageRank算法。我们证明,悬挂顶点可以提升ITA的收敛速度,而未引用顶点和弱未引用顶点可以减少ITA的计算量。与MONTE CARLO方法相比,ITA具有更低的带宽需求;与幂法相比,ITA具有更高的收敛速度且计算量更小。最后,在四个数据集上的实验结果表明,ITA的速度比幂法快1.5-4倍,且收敛更均匀。