Query performance prediction (QPP) is a core task in information retrieval. The QPP task is to predict the retrieval quality of a search system for a query without relevance judgments. Research has shown the effectiveness and usefulness of QPP for ad-hoc search. Recent years have witnessed considerable progress in conversational search (CS). Effective QPP could help a CS system to decide an appropriate action to be taken at the next turn. Despite its potential, QPP for CS has been little studied. We address this research gap by reproducing and studying the effectiveness of existing QPP methods in the context of CS. While the task of passage retrieval remains the same in the two settings, a user query in CS depends on the conversational history, introducing novel QPP challenges. In particular, we seek to explore to what extent findings from QPP methods for ad-hoc search generalize to three CS settings: (i) estimating the retrieval quality of different query rewriting-based retrieval methods, (ii) estimating the retrieval quality of a conversational dense retrieval method, and (iii) estimating the retrieval quality for top ranks vs. deeper-ranked lists. Our findings can be summarized as follows: (i) supervised QPP methods distinctly outperform unsupervised counterparts only when a large-scale training set is available; (ii) point-wise supervised QPP methods outperform their list-wise counterparts in most cases; and (iii) retrieval score-based unsupervised QPP methods show high effectiveness in assessing the conversational dense retrieval method, ConvDR.
翻译:查询性能预测(QPP)是信息检索中的核心任务。QPP任务旨在无相关性判断的情况下预测搜索系统对查询的检索质量。研究表明QPP在即席搜索中的有效性和实用性。近年来,对话式搜索(CS)取得了显著进展。有效的QPP可帮助CS系统决定下一轮交互中应采取何种适当行动。尽管具有潜在价值,但针对CS的QPP研究仍较为匮乏。我们通过复现并研究现有QPP方法在CS场景中的有效性来填补这一研究空白。尽管段落检索任务在两种场景中保持不变,但CS中的用户查询依赖于对话历史,这带来了新的QPP挑战。具体而言,我们旨在探索即席搜索的QPP方法在多大程度上可推广至三种CS场景:(i) 估计基于不同查询重写检索方法的检索质量,(ii) 估计对话式稠密检索方法的检索质量,(iii) 估计前几位排序结果与深层排序结果的检索质量。主要发现可归纳如下:(i) 有监督QPP方法仅在拥有大规模训练集时明显优于无监督方法;(ii) 在多数情况下,点式有监督QPP方法优于列式方法;(iii) 基于检索分数的无监督QPP方法在评估对话式稠密检索方法ConvDR时表现出高效性。