A Comparative Analysis of Retrievability and PageRank Measures

The accessibility of documents within a collection holds a pivotal role in Information Retrieval, signifying the ease of locating specific content in a collection of documents. This accessibility can be achieved via two distinct avenues. The first is through some retrieval model using a keyword or other feature-based search, and the other is where a document can be navigated using links associated with them, if available. Metrics such as PageRank, Hub, and Authority illuminate the pathways through which documents can be discovered within the network of content while the concept of Retrievability is used to quantify the ease with which a document can be found by a retrieval model. In this paper, we compare these two perspectives, PageRank and retrievability, as they quantify the importance and discoverability of content in a corpus. Through empirical experimentation on benchmark datasets, we demonstrate a subtle similarity between retrievability and PageRank particularly distinguishable for larger datasets.

翻译：文档在集合中的可访问性在信息检索中扮演着关键角色，体现了在文档集合中定位特定内容的难易程度。这种可访问性可通过两种不同途径实现：一是通过基于关键词或其他特征搜索的检索模型，二是利用文档关联链接进行导航（如存在此类链接）。PageRank、枢纽度（Hub）和权威度（Authority）等指标揭示了文档在网络内容中的发现路径，而检索能力（Retrievability）概念则用于量化检索模型找到文档的难易程度。本文通过对比分析这两种视角——PageRank与检索能力，研究它们如何量化语料库中内容的重要性和可发现性。基于基准数据集的实证实验表明，检索能力与PageRank之间存在微妙的相似性，尤其在较大数据集中表现更为显著。

相关内容

PageRank

关注 210

PageRank，网页排名，又称网页级别、Google左侧排名或佩奇排名，是一种由[1] 根据网页之间相互的超链接计算的技术，而作为网页排名的要素之一，以Google公司创办人拉里·佩奇（Larry Page）之姓来命名。Google用它来体现网页的相关性和重要性，在搜索引擎优化操作中是经常被用来评估网页优化的成效因素之一。Google的创始人拉里·佩奇和谢尔盖·布林于1998年在斯坦福大学发明了这项技术。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日