Modeling empathy is a complex endeavor that is rooted in interpersonal and experiential dimensions of human interaction, and remains an open problem within AI. Existing empathy datasets fall short in capturing the richness of empathy responses, often being confined to in-lab or acted scenarios, lacking longitudinal data, and missing self-reported labels. We introduce a new multimodal dataset for empathy during personal experience sharing: the EmpathicStories++ dataset (https://mitmedialab.github.io/empathic-stories-multimodal/) containing 53 hours of video, audio, and text data of 41 participants sharing vulnerable experiences and reading empathically resonant stories with an AI agent. EmpathicStories++ is the first longitudinal dataset on empathy, collected over a month-long deployment of social robots in participants' homes, as participants engage in natural, empathic storytelling interactions with AI agents. We then introduce a novel task of predicting individuals' empathy toward others' stories based on their personal experiences, evaluated in two contexts: participants' own personal shared story context and their reflections on stories they read. We benchmark this task using state-of-the-art models to pave the way for future improvements in contextualized and longitudinal empathy modeling. Our work provides a valuable resource for further research in developing empathetic AI systems and understanding the intricacies of human empathy within genuine, real-world settings.
翻译:共情建模植根于人际互动与经验维度,是一项复杂任务,在人工智能领域仍是一个开放性问题。现有共情数据集在捕捉共情反应的丰富性方面存在不足,通常局限于实验室或表演情境,缺乏纵向数据,且缺失自我报告标签。我们引入了一个用于个人经历分享中共情研究的新型多模态数据集:EmpathicStories++ 数据集(https://mitmedialab.github.io/empathic-stories-multimodal/),包含41名参与者分享脆弱经历以及与AI智能体阅读共情共鸣故事时的53小时视频、音频和文本数据。EmpathicStories++ 是首个纵向共情数据集,通过在社会机器人于参与者家中部署一个月的过程中采集,记录了参与者与AI智能体进行自然共情叙事互动的全过程。我们进而提出一项新颖任务:基于个人经历预测个体对他人故事的共情程度,并在两种情境下进行评估:参与者自身分享的个人故事情境,以及他们对所阅读故事的反思。我们使用最先进的模型对该任务进行基准测试,为情境化与纵向共情建模的未来改进铺平道路。本研究为开发共情AI系统及理解真实世界环境中人类共情的复杂性提供了宝贵资源。