Although empathic interaction between counselor and client is fundamental to success in the psychotherapeutic process, there are currently few datasets to aid a computational approach to empathy understanding. In this paper, we construct a multimodal empathy dataset collected from face-to-face psychological counseling sessions. The dataset consists of 771 video clips. We also propose three labels (i.e., expression of experience, emotional reaction, and cognitive reaction) to describe the degree of empathy between counselors and their clients. Expression of experience describes whether the client has expressed experiences that can trigger empathy, and emotional and cognitive reactions indicate the counselor's empathic reactions. As an elementary assessment of the usability of the constructed multimodal empathy dataset, an interrater reliability analysis of annotators' subjective evaluations for video clips is conducted using the intraclass correlation coefficient and Fleiss' Kappa. Results prove that our data annotation is reliable. Furthermore, we conduct empathy prediction using three typical methods, including the tensor fusion network, the sentimental words aware fusion network, and a simple concatenation model. The experimental results show that empathy can be well predicted on our dataset. Our dataset is available for research purposes.
翻译:尽管心理咨询师与来访者之间的共情互动是心理治疗过程成功的基础,但目前支持共情理解计算方法的数据集仍较为匮乏。本文构建了一个从面对面心理咨询中收集的多模态共情数据集,包含771个视频片段。我们提出三个标注维度(体验表达、情绪反应和认知反应)来描述咨询师与来访者之间的共情程度:体验表达反映来访者是否表达了能引发共情的经历,而情绪反应和认知反应则表征咨询师的共情回应。为初步评估所构建多模态共情数据集的可用性,采用组内相关系数和Fleiss' Kappa对标注者对视频片段的主观评价进行评分者信度分析,结果证明我们的数据标注具有可靠性。进一步地,我们采用三种典型方法(张量融合网络、情感词感知融合网络及简单拼接模型)进行共情预测,实验结果表明共情度在本数据集上可获得良好预测。本数据集可供研究使用。