Open domain conversational agents can answer a broad range of targeted queries. However, the sequential nature of interaction with these systems makes knowledge exploration a lengthy task which burdens the user with asking a chain of well phrased questions. In this paper, we present a retrieval based system and associated dataset for predicting the next questions that the user might have. Such a system can proactively assist users in knowledge exploration leading to a more engaging dialog. The retrieval system is trained on a dataset which contains ~14K multi-turn information-seeking conversations with a valid follow-up question and a set of invalid candidates. The invalid candidates are generated to simulate various syntactic and semantic confounders such as paraphrases, partial entity match, irrelevant entity, and ASR errors. We use confounder specific techniques to simulate these negative examples on the OR-QuAC dataset and develop a dataset called the Follow-up Query Bank (FQ-Bank). Then, we train ranking models on FQ-Bank and present results comparing supervised and unsupervised approaches. The results suggest that we can retrieve the valid follow-ups by ranking them in higher positions compared to confounders, but further knowledge grounding can improve ranking performance.
翻译:开放域对话代理可以回答广泛的目标查询。然而,与这些系统交互的序列特性使得知识探索成为一项耗时长且需要用户提出一连串措辞得当的问题的任务。本文提出了一种基于检索的系统及其关联数据集,用于预测用户可能提出的下一个问题。这类系统能够主动协助用户进行知识探索,从而实现更引人入胜的对话。该检索系统在一个包含约14,000个多轮信息寻求对话的数据集上进行训练,每个对话包含一个有效的后续问题及一组无效候选问题。无效候选问题被生成以模拟各种句法和语义混淆因素,如释义、部分实体匹配、无关实体以及自动语音识别错误。我们使用特定于混淆因素的技术,在OR-QuAC数据集上模拟这些负例,并构建了一个名为“后续查询库”(FQ-Bank)的数据集。随后,我们在FQ-Bank上训练排序模型,并呈现了有监督与无监督方法的比较结果。结果表明,我们可以通过将有效后续问题排在比混淆因素更高的位置来检索它们,但进一步的知识扎根可以提升排序性能。