Implicit user feedback, user emotions and demographic information have shown to be promising sources for improving the accuracy and user engagement of responses generated by dialogue systems. However, the influence of such information on task completion and factual consistency, which are important criteria for task-oriented and document-grounded dialogues, is not yet known. To address this, we introduce FEDI, the first English task-oriented and document-grounded dialogue dataset annotated with this information. Our experiments with Flan-T5, GPT-2 and Llama 2 show a particularly positive impact on task completion and factual consistency. Participants in our human evaluation reported that the responses generated by the feedback-trained models were more informative (Flan-T5 and GPT-2), relevant and factual consistent (Llama 2).
翻译:隐式用户反馈、用户情感与人口统计信息已被证明是提升对话系统生成响应的准确性与用户参与度的潜在有效来源。然而,此类信息对于任务完成度和事实一致性——这两项是面向任务与文档驱动对话的重要评估指标——的影响尚未明确。为此,我们提出了FEDI,首个标注了此类信息的英文面向任务与文档驱动对话数据集。我们使用Flan-T5、GPT-2和Llama 2进行的实验表明,这些信息对任务完成度和事实一致性具有显著的积极影响。参与我们人工评估的受试者反馈,经过反馈训练的模型所生成的响应更具信息量(Flan-T5与GPT-2)、更相关且事实一致性更高(Llama 2)。