Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.
翻译:摘要:当前共情回复生成方法通常直接编码整个对话历史,并将输出送入解码器以生成友好反馈。这些方法侧重于建模上下文信息,但忽略了对说话者直接意图的捕捉。我们认为,对话中的最后一句表述经验性地传达了说话者的意图。为此,我们提出了一种名为InferEM的新型模型用于共情回复生成。我们单独编码最后一句表述,并通过基于多头注意力的意图融合模块将其与整个对话融合,以捕捉说话者意图。此外,我们利用先前表述预测最后一句表述,模拟人类预先猜测对话者可能发言的心理过程。为平衡表述预测与回复生成的优化速率,我们为InferEM设计了多任务学习策略。实验结果表明,InferEM在提升共情表达方面的合理性与有效性。