The overwhelming volume of data generated and indexed by search engines poses a significant challenge in retrieving documents from the index efficiently and effectively. Even with a well-crafted query, several relevant documents often get buried among a multitude of competing documents, resulting in reduced accessibility or `findability' of the desired document. Consequently, it is crucial to develop a robust methodology for assessing this dimension of Information Retrieval (IR) system performance. While previous studies have focused on measuring document accessibility disregarding user queries and document relevance, there exists no metric to quantify the findability of a document within a given IR system without resorting to manual labor. This paper aims to address this gap by defining and deriving a metric to evaluate the findability of documents as perceived by end-users. Through experiments, we demonstrate the varying impact of different retrieval models and collections on the findability of documents. Furthermore, we establish the findability measure as an independent metric distinct from retrievability, an accessibility measure introduced in prior literature.
翻译:搜索引擎生成和索引的海量数据给从索引中高效、有效地检索文档带来了重大挑战。即便使用精心构造的查询,许多相关文档仍会淹没在大量竞争文档中,导致目标文档的可访问性或"可发现性"降低。因此,开发评估信息检索系统这一维度的稳健方法论至关重要。虽然先前研究侧重于测量忽略用户查询和文档相关性的文档可访问性,但目前尚无指标能在无需人工干预的情况下量化特定IR中文档的可发现性。本文旨在通过定义并推导度量终端用户感知的文档可发现性的指标来填补这一空白。通过实验,我们证明了不同检索模型和文档集合对文档可发现性的差异化影响。此外,我们确认可发现性度量是独立于可检索性(先前文献引入的可访问性度量)的独立指标。