Linguistic pragmatics state that a conversation's underlying speech acts can constrain the type of response which is appropriate at each turn in the conversation. When generating dialogue responses, neural dialogue agents struggle to produce diverse responses. Currently, dialogue diversity is assessed using automatic metrics, but the underlying speech acts do not inform these metrics. To remedy this, we propose the notion of Pragmatically Appropriate Diversity, defined as the extent to which a conversation creates and constrains the creation of multiple diverse responses. Using a human-created multi-response dataset, we find significant support for the hypothesis that speech acts provide a signal for the diversity of the set of next responses. Building on this result, we propose a new human evaluation task where creative writers predict the extent to which conversations inspire the creation of multiple diverse responses. Our studies find that writers' judgments align with the Pragmatically Appropriate Diversity of conversations. Our work suggests that expectations for diversity metric scores should vary depending on the speech act.
翻译:语言学语用学表明,对话中的潜在言语行为会约束每个话轮中恰当回应的类型。在生成对话回应时,神经对话代理难以产生多样化的回应。目前,对话多样性通过自动评估指标来衡量,但这些指标并未考虑潜在的言语行为。为解决这一问题,我们提出"语用恰当多样性"的概念,其定义为对话创造并约束多种多样化回应生成的程度。通过使用人工构建的多回应数据集,我们找到了有力证据支持以下假设:言语行为为下一组回应的多样性提供了信号。基于这一发现,我们提出了一项新的人工评估任务,由创意写作者预测对话激发多种多样化回应生成的程度。我们的研究发现,写作者的判断与对话的语用恰当多样性一致。本研究建议,多样性指标分数的期望值应根据言语行为的不同而变化。