We introduce the concept of "empathic grounding" in conversational agents as an extension of Clark's conceptualization of grounding in conversation in which the grounding criterion includes listener empathy for the speaker's affective state. Empathic grounding is generally required whenever the speaker's emotions are foregrounded and can make the grounding process more efficient and reliable by communicating both propositional and affective understanding. Both speaker expressions of affect and listener empathic grounding can be multimodal, including facial expressions and other nonverbal displays. Thus, models of empathic grounding for embodied agents should be multimodal to facilitate natural and efficient communication. We describe a multimodal model that takes as input user speech and facial expression to generate multimodal grounding moves for a listening agent using a large language model. We also describe a testbed to evaluate approaches to empathic grounding, in which a humanoid robot interviews a user about a past episode of pain and then has the user rate their perception of the robot's empathy. We compare our proposed model to one that only generates non-affective grounding cues in a between-subjects experiment. Findings demonstrate that empathic grounding increases user perceptions of empathy, understanding, emotional intelligence, and trust. Our work highlights the role of emotion awareness and multimodality in generating appropriate grounding moves for conversational agents.
翻译:本文提出对话代理中的“共情基础”概念,作为Clark对话基础理论的延伸,其基础标准包含倾听者对说话者情感状态的共情。当说话者情感处于显著地位时,通常需要共情基础,通过同时传递命题性理解和情感性理解,可使基础过程更高效可靠。说话者的情感表达与倾听者的共情基础均可呈现多模态特征,涵盖面部表情及其他非语言表现。因此,具身代理的共情基础模型应具备多模态特性,以促进自然高效的沟通。我们提出一种多模态模型,该模型以用户语音及面部表情为输入,通过大型语言模型为倾听代理生成多模态基础行为。同时构建了共情基础评估测试平台,其中仿人机器人就用户过往疼痛经历进行访谈,并由用户评估其对机器人共情能力的感知。通过组间实验,我们将所提模型与仅生成非情感性基础提示的模型进行对比。研究发现共情基础能显著提升用户对代理共情能力、理解程度、情绪智力及可信度的感知。本研究揭示了情感意识与多模态特性在生成对话代理适宜基础行为中的关键作用。