Across a variety of ranking tasks, researchers use reciprocal rank to measure the effectiveness for users interested in exactly one relevant item. Despite its widespread use, evidence suggests that reciprocal rank is brittle when discriminating between systems. This brittleness, in turn, is compounded in modern evaluation settings where current, high-precision systems may be difficult to distinguish. We address the lack of sensitivity of reciprocal rank by introducing and connecting it to the concept of best-case retrieval, an evaluation method focusing on assessing the quality of a ranking for the most satisfied possible user across possible recall requirements. This perspective allows us to generalize reciprocal rank and define a new preference-based evaluation we call lexicographic precision or lexiprecision. By mathematical construction, we ensure that lexiprecision preserves differences detected by reciprocal rank, while empirically improving sensitivity and robustness across a broad set of retrieval and recommendation tasks.
翻译:在各种排序任务中,研究人员使用倒数排名来评估那些只关注一个相关项的用户的效果。尽管广泛使用,但证据表明倒数排名在区分不同系统时存在脆弱性。这种脆弱性在当前的评估环境中进一步加剧,因为高精度系统可能难以区分。我们通过引入并连接倒数排名与最佳情况检索的概念来解决其敏感性不足的问题。最佳情况检索是一种评估方法,专注于评估在可能的召回需求下最满意用户的排序质量。这一视角使我们能够推广倒数排名,并定义一种新的基于偏好的评估方法,称为词典序精确度(lexiprecision)。通过数学构造,我们确保词典序精确度既能保留倒数排名检测到的差异,又能在广泛的检索和推荐任务中经验性地提高敏感性和鲁棒性。