Conversational systems have made significant progress in generating natural language responses. However, their potential as conversational search systems is currently limited due to their passive role in the information-seeking process. One major limitation is the scarcity of datasets that provide labelled ambiguous questions along with a supporting corpus of documents and relevant clarifying questions. This work aims to tackle the challenge of generating relevant clarifying questions by taking into account the inherent ambiguities present in both user queries and documents. To achieve this, we propose PAQA, an extension to the existing AmbiNQ dataset, incorporating clarifying questions. We then evaluate various models and assess how passage retrieval impacts ambiguity detection and the generation of clarifying questions. By addressing this gap in conversational search systems, we aim to provide additional supervision to enhance their active participation in the information-seeking process and provide users with more accurate results.
翻译:对话系统在生成自然语言响应方面取得了显著进展。然而,由于其在信息寻求过程中扮演被动角色,其作为对话式搜索系统的潜力目前受到限制。一个主要局限在于缺乏提供标注模糊问题、相关文档语料库及相应澄清问题的数据集。本文旨在通过考虑用户查询和文档中存在的固有模糊性,解决生成相关澄清问题的挑战。为此,我们提出PAQA——对现有AmbiNQ数据集的扩展,融入澄清问题。随后,我们评估多种模型,并探究段落检索对模糊性检测及澄清问题生成的影响。通过填补对话式搜索系统的这一空白,我们旨在提供额外监督以增强其在信息寻求过程中的主动参与,并为用户提供更准确的结果。