Research on conversational search has so far mostly focused on query rewriting and multi-stage passage retrieval. However, synthesizing the top retrieved passages into a complete, relevant, and concise response is still an open challenge. Having snippet-level annotations of relevant passages would enable both (1) the training of response generation models that are able to ground answers in actual statements and (2) the automatic evaluation of the generated responses in terms of completeness. In this paper, we address the problem of collecting high-quality snippet-level answer annotations for two of the TREC Conversational Assistance track datasets. To ensure quality, we first perform a preliminary annotation study, employing different task designs, crowdsourcing platforms, and workers with different qualifications. Based on the outcomes of this study, we refine our annotation protocol before proceeding with the full-scale data collection. Overall, we gather annotations for 1.8k question-paragraph pairs, each annotated by three independent crowd workers. The process of collecting data at this magnitude also led to multiple insights about the problem that can inform the design of future response-generation methods. This is an extended version of the article published with the same title in the Proceedings of CIKM'23.
翻译:对话式搜索的研究迄今主要集中于查询改写与多阶段段落检索。然而,将检索到的顶部段落综合成完整、相关且简洁的回复仍是一个开放挑战。对相关段落进行片段级标注既能(1)训练基于实际陈述生成回复的模型,又能(2)从完整性角度自动评估生成的回复。本文针对TREC对话式辅助赛道两个数据集,着力解决高质量片段级答案标注的收集问题。为确保质量,我们首先开展预标注研究,采用不同任务设计、众包平台及不同资质的标注人员。基于该研究的成果,在全面数据收集前对标注协议进行了完善。最终,我们为1.8k个问题-段落对收集了标注,每个对由三名独立众包工人标注。此类大规模数据采集过程也产生了关于该问题的多项洞见,可为未来回复生成方法的设计提供指导。本文为发表于CIKM'23会议论文集的同名文章的扩展版本。