Empathetic response generation, aiming to understand the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Traditional approaches typically employ maximum likelihood estimation as the optimization objective during training, yet fail to align the empathy levels between generated and target responses. To this end, we propose an empathetic response generation framework using reinforcement learning (EmpRL). The framework develops an effective empathy reward function and generates empathetic responses by maximizing the expected reward through reinforcement learning. EmpRL utilizes the pre-trained T5 model as the generator and further fine-tunes it to initialize the policy. To align the empathy levels between generated and target responses within a given context, an empathy reward function containing three empathy communication mechanisms -- emotional reaction, interpretation, and exploration -- is constructed using pre-designed and pre-trained empathy identifiers. During reinforcement learning training, the proximal policy optimization algorithm is used to fine-tune the policy, enabling the generation of empathetic responses. Both automatic and human evaluations demonstrate that the proposed EmpRL framework significantly improves the quality of generated responses, enhances the similarity in empathy levels between generated and target responses, and produces empathetic responses covering both affective and cognitive aspects.
翻译:共情回复生成旨在理解用户的处境与感受并给予共情回应,对于构建类人对话系统至关重要。传统方法通常在训练时采用最大似然估计作为优化目标,但未能对齐生成回复与目标回复之间的共情水平。为此,我们提出一种使用强化学习的共情回复生成框架(EmpRL)。该框架设计了有效的共情奖励函数,并通过强化学习最大化期望奖励来生成共情回复。EmpRL 以预训练的 T5 模型作为生成器,并进一步通过微调初始化策略。为了在给定上下文中对齐生成回复与目标回复的共情水平,我们利用预先设计并预训练的共情识别器,构建了一个包含三种共情交流机制——情感反应、解释与探索——的共情奖励函数。在强化学习训练阶段,采用近端策略优化算法对策略进行微调,从而生成共情回复。自动评估与人工评估均表明,所提出的 EmpRL 框架显著提升了生成回复的质量,增强了生成回复与目标回复在共情水平上的相似性,并能产生同时涵盖情感与认知层面的共情回复。