Ranking functions such as PageRank assign numeric values (ranks) to nodes of graphs, most notably the web graph. Node rankings are an integral part of Internet search algorithms, since they can be used to order the results of queries. However, these ranking functions are famously subject to attacks by spammers, who modify the web graph in order to give their own pages more rank. We characterize the interplay between rankers and spammers as a game. We define the two critical features of this game, spam resistance and distortion, based on how spammers spam and how rankers protect against spam. We observe that all the ranking functions that are well-studied in the literature, including the original formulation of PageRank, have poor spam resistance, poor distortion, or both. Finally, we study Min-PPR, the form of PageRank used at Google itself, but which has received no (theoretical or empirical) treatment in the literature. We prove that Min-PPR has low distortion and high spam resistance. A secondary benefit is that Min-PPR comes with an explicit cost function on nodes that shows how important they are to the spammer; thus a ranker can focus their spam-detection capacity on these vulnerable nodes. Both Min-PPR and its associated cost function are straightforward to compute.
翻译:诸如PageRank等排序函数为图(尤其是网页图)的节点分配数值(排名)。节点排名是互联网搜索算法的核心组成部分,因为其可用于对查询结果排序。然而,这些排序函数易受垃圾邮件制造者的攻击——他们通过修改网页图来提升自身页面的排名。我们将排序器与垃圾制造者的博弈特性形式化为一个博弈过程。基于垃圾制造者的攻击方式及排序器的防御机制,我们定义了该博弈的两个关键特性:垃圾抵抗性与失真度。研究发现,文献中所有被深入研究的排序函数(包括PageRank的原始公式)在垃圾抵抗性或失真度方面均存在缺陷。最后,我们研究了谷歌内部使用的PageRank变体Min-PPR(该算法此前未在文献中获得理论或实证分析)。我们证明Min-PPR兼具低失真度与高垃圾抵抗性。其另一优势在于,Min-PPR能为各节点提供显式代价函数,从而揭示其对垃圾制造者的重要性;排序器可据此将垃圾检测能力集中于这些脆弱节点。Min-PPR及其关联代价函数均易于计算。