The traditional use-case of query performance prediction (QPP) is to identify which queries perform well and which perform poorly for a given ranking model. A more fine-grained and arguably more challenging extension of this task is to determine which ranking models are most effective for a given query. In this work, we generalize the QPP task and its evaluation into three settings: (i) SingleRanker MultiQuery (SRMQ-PP), corresponding to the standard use case; (ii) MultiRanker SingleQuery (MRSQ-PP), which evaluates a QPP model's ability to select the most effective ranker for a query; and (iii) MultiRanker MultiQuery (MRMQ-PP), which considers predictions jointly across all query ranker pairs. Our results show that (a) the relative effectiveness of QPP models varies substantially across tasks (SRMQ-PP vs. MRSQ-PP), and (b) predicting the best ranker for a query is considerably more difficult than predicting the relative difficulty of queries for a given ranker.
翻译:查询性能预测(QPP)的传统用例是识别给定排序模型下哪些查询表现良好,哪些表现不佳。该任务的一个更细粒度且更具挑战性的扩展是确定对于给定查询,哪些排序模型最为有效。在本工作中,我们将QPP任务及其评估推广至三种设定:(i)单排序器多查询(SRMQ-PP),对应标准用例;(ii)多排序器单查询(MRSQ-PP),用于评估QPP模型为查询选择最有效排序器的能力;以及(iii)多排序器多查询(MRMQ-PP),该设定综合考虑所有查询-排序器对的预测。我们的结果表明:(a)QPP模型的相对有效性在不同任务(SRMQ-PP与MRSQ-PP)间存在显著差异;(b)为查询预测最佳排序器比预测给定排序器下查询的相对难度要困难得多。