Rank-Biased Overlap (RBO) is a similarity measure for indefinite rankings: it is top-weighted, and can be computed when only a prefix of the rankings is known or when they have only some items in common. It is widely used for instance to analyze differences between search engines by comparing the rankings of documents they retrieve for the same queries. In these situations, though, it is very frequent to find tied documents that have the same score. Unfortunately, the treatment of ties in RBO remains superficial and incomplete, in the sense that it is not clear how to calculate it from the ranking prefixes only. In addition, the existing way of dealing with ties is very different from the one traditionally followed in the field of Statistics, most notably found in rank correlation coefficients such as Kendall's and Spearman's. In this paper we propose a generalized formulation for RBO to handle ties, thanks to which we complete the original definitions by showing how to perform prefix evaluation. We also use it to fully develop two variants that align with the ones found in the Statistics literature: one when there is a reference ranking to compare to, and one when there is not. Overall, these three variants provide researchers with flexibility when comparing rankings with RBO, by clearly determining what ties mean, and how they should be treated. Finally, using both synthetic and TREC data, we demonstrate the use of these new tie-aware RBO measures. We show that the scores may differ substantially from the original tie-unaware RBO measure, where ties had to be broken at random or by arbitrary criteria such as by document ID. Overall, these results evidence the need for a proper account of ties in rank similarity measures such as RBO.
翻译:排名偏置重叠(RBO)是一种适用于无限排名的相似性度量方法:它具有顶部加权特性,且仅需已知排名的前缀或当排名仅部分项目重合时即可计算。该方法被广泛用于分析搜索引擎间的差异,例如通过比较不同搜索引擎对相同查询返回的文档排序。然而在此类场景中,经常会出现得分相同的并列文档。遗憾的是,RBO对平局的处理仍停留在表面且不完整,其问题在于仅通过排名前缀无法明确计算方法。此外,现有处理平局的方式与传统统计学领域(尤其体现在肯德尔和斯皮尔曼等秩相关系数中)的方法存在显著差异。本文提出了一种处理平局的RBO广义化公式,藉此完善了原始定义,并展示了如何进行前缀评估。基于此公式,我们进一步推导出与统计学文献中两种经典变体完全对应的版本:一种适用于存在参考排名的情况,另一种适用于无参考排名的情况。总体而言,这三种变体通过明确定义平局的含义及处理方式,为研究者使用RBO比较排名提供了灵活性。最后,我们通过合成数据和TREC数据展示了这些新型平局感知RBO度量的应用。实验表明,这些度量结果与原始未考虑平局的RBO(需通过随机或文档ID等任意标准打破平局)存在显著差异。总体而言,这些结果证明了在RBO等排名相似性度量中正确处理平局的必要性。