Empathetic response generation is a desirable aspect of conversational agents, crucial for facilitating engaging and emotionally intelligent multi-turn conversations between humans and machines. Leveraging large language models for this task has shown promising results, yet challenges persist in ensuring both the empathetic quality of the responses and retention of the generalization performance of the models. In this paper, we propose a novel approach where we construct theory-driven preference datasets and use them to align LLMs with preference optimization algorithms to address these challenges. To measure empathetic response generation, we employ the EmpatheticDialogues dataset, assessing empathy with the diff-EPITOME and BERTscore metrics, and evaluate the generalization performance on the MMLU benchmark. We make all datasets, source code, and models publicly available.
翻译:共情回复生成是对话系统的一项理想特性,对于促进人机之间具有吸引力且情感智能的多轮对话至关重要。利用大语言模型完成此任务已展现出有前景的结果,但在确保回复的共情质量与保持模型泛化性能方面仍存在挑战。本文提出一种创新方法:通过构建理论驱动的偏好数据集,并利用偏好优化算法对齐大语言模型以应对这些挑战。为衡量共情回复生成效果,我们采用EmpatheticDialogues数据集,使用diff-EPITOME与BERTscore指标评估共情能力,并在MMLU基准上评估泛化性能。我们已公开所有数据集、源代码及模型。