The effectiveness of clarification question models in engaging users within search systems is currently constrained, casting doubt on their overall usefulness. To improve the performance of these models, it is crucial to employ assessment approaches that encompass both real-time feedback from users (online evaluation) and the characteristics of clarification questions evaluated through human assessment (offline evaluation). However, the relationship between online and offline evaluations has been debated in information retrieval. This study aims to investigate how this discordance holds in search clarification. We use user engagement as ground truth and employ several offline labels to investigate to what extent the offline ranked lists of clarification resemble the ideal ranked lists based on online user engagement.
翻译:澄清问题模型在搜索系统中吸引用户的有效性目前受到限制,这使人们对其整体实用性产生质疑。为提升这些模型的性能,关键在于采用涵盖用户实时反馈(在线评估)及通过人工评估的澄清问题特征(离线评估)的评估方法。然而,在线评估与离线评估之间的关系在信息检索领域一直存在争议。本研究旨在探究这种分歧在搜索澄清任务中的具体表现。我们以用户参与度作为基准真值,采用多种离线标签,研究基于在线用户参与度构建的理想排序列表与离线澄清问题排序列表的相似程度。