Empathy plays a pivotal role in fostering prosocial behavior, often triggered by the sharing of personal experiences through narratives. However, modeling empathy using NLP approaches remains challenging due to its deep interconnection with human interaction dynamics. Previous approaches, which involve fine-tuning language models (LMs) on human-annotated empathic datasets, have had limited success. In our pursuit of improving empathy understanding in LMs, we propose several strategies, including contrastive learning with masked LMs and supervised fine-tuning with large language models. While these methods show improvements over previous methods, the overall results remain unsatisfactory. To better understand this trend, we performed an analysis which reveals a low agreement among annotators. This lack of consensus hinders training and highlights the subjective nature of the task. We also explore the cultural impact on annotations. To study this, we meticulously collected story pairs in Urdu language and find that subjectivity in interpreting empathy among annotators appears to be independent of cultural background. Our systematic exploration of LMs' understanding of empathy reveals substantial opportunities for further investigation in both task formulation and modeling.
翻译:共情在促进亲社会行为中起着关键作用,通常通过叙事分享个人经历而触发。然而,由于共情与人类互动动态的深度关联,使用自然语言处理方法对其建模仍然具有挑战性。先前基于人工标注共情数据集微调语言模型的方法成效有限。为提升语言模型的共情理解能力,我们提出了若干策略,包括基于掩码语言模型的对比学习与基于大语言模型的监督微调。尽管这些方法较以往方法有所改进,但整体结果仍不尽如人意。为深入理解这一现象,我们进行了分析,发现标注者间的一致性较低。这种共识的缺乏阻碍了模型训练,并凸显了任务的主观性本质。我们还探究了文化背景对标注的影响。为此,我们精心收集了乌尔都语故事对,发现标注者解读共情的主观性似乎与文化背景无关。我们对语言模型共情理解能力的系统性探索表明,在任务构建与建模方面仍存在广阔的研究空间。