A large number of approaches to Query Performance Prediction (QPP) have been proposed over the last two decades. As early as 2009, Hauff et al. [28] explored whether different QPP methods may be combined to improve prediction quality. Since then, significant research has been done both on QPP approaches, as well as their evaluation. This study revisits Hauff et al.s work to assess the reproducibility of their findings in the light of new prediction methods, evaluation metrics, and datasets. We expand the scope of the earlier investigation by: (i) considering post-retrieval methods, including supervised neural techniques (only pre-retrieval techniques were studied in [28]); (ii) using sMARE for evaluation, in addition to the traditional correlation coefficients and RMSE; and (iii) experimenting with additional datasets (Clueweb09B and TREC DL). Our results largely support previous claims, but we also present several interesting findings. We interpret these findings by taking a more nuanced look at the correlation between QPP methods, examining whether they capture diverse information or rely on overlapping factors.
翻译:过去二十年中,研究者提出了大量查询性能预测(QPP)方法。早在2009年,Hauff等人[28]就探讨了不同QPP方法组合是否能够提升预测质量。此后,学界在QPP方法及其评估方面均取得了重要进展。本研究重新审视Hauff等人的工作,结合新型预测方法、评估指标与数据集,对其研究结论的可复现性进行评估。我们通过以下方式扩展了早期研究范围:(i)纳入检索后方法(包括监督式神经网络技术),而[28]仅研究了检索前技术;(ii)除传统相关系数与RMSE外,额外采用sMARE进行评估;(iii)在Clueweb09B和TREC DL等新增数据集上进行实验。我们的结果基本支持先前结论,同时也呈现了若干新发现。我们通过更细致地考察QPP方法间的相关性来解读这些发现,探究它们是否捕获了多样化信息或依赖于重叠因素。