Empathetic response generation, aiming at understanding the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Previous methods mainly focus on using maximum likelihood estimation as the optimization objective for training response generation models, without taking into account the empathy level alignment between generated responses and target responses. To this end, we propose an empathetic response generation using reinforcement learning (EmpRL) framework. The framework designs an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. Given the powerful text generation capability of pre-trained language models, EmpRL utilizes the pre-trained T5 model as the generator and conducts further training to initialize the policy. To align the empathy level between generated responses and target responses in the context, an empathy reward function containing three empathy communication mechanisms, i.e., emotional reaction, interpretation, and exploration, is constructed using pre-designed and pre-trained empathy identifiers. Finally, the proximal policy optimization algorithm is used to further train the policy to produce empathetic responses. Both automatic and manual evaluations demonstrate that the proposed EmpRL framework can improve the quality of generated responses, enhance the empathy level similarity between generated and target responses, and produce empathetic responses covering both affective and cognitive aspects.
翻译:共情回复生成旨在理解用户处境与情感并作出共情回应,对于构建类人对话系统至关重要。现有方法主要采用最大似然估计作为回复生成模型的训练优化目标,未考虑生成回复与目标回复间的共情水平对齐问题。为此,我们提出一种基于强化学习的共情回复生成框架。该框架设计了高效的共情奖励函数,通过强化学习最大化期望奖励来生成共情回复。鉴于预训练语言模型强大的文本生成能力,本框架采用预训练T5模型作为生成器,并通过进一步训练初始化策略网络。为实现上下文情境中生成回复与目标回复的共情水平对齐,利用预设计且预训练的共情识别器构建包含三种共情交流机制(情感反应、阐释与探索)的共情奖励函数。最后采用近端策略优化算法对策略网络进行强化训练以生成共情回复。自动评估与人工评估结果均表明,所提框架能提升生成回复质量,增强生成回复与目标回复的共情水平相似度,并生成同时涵盖情感维度与认知维度的共情回复。