GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individuals' group identities. To address this gap, we introduce GIEBench, a comprehensive benchmark that includes 11 identity dimensions, covering 97 group identities with a total of 999 single-choice questions related to specific group identities. GIEBench is designed to evaluate the empathy of LLMs when presented with specific group identities such as gender, age, occupation, and race, emphasizing their ability to respond from the standpoint of the identified group. This supports the ongoing development of empathetic LLM applications tailored to users with different identities. Our evaluation of 23 LLMs revealed that while these LLMs understand different identity standpoints, they fail to consistently exhibit equal empathy across these identities without explicit instructions to adopt those perspectives. This highlights the need for improved alignment of LLMs with diverse values to better accommodate the multifaceted nature of human identities. Our datasets are available at https://github.com/GIEBench/GIEBench.

翻译：随着大语言模型的持续发展和广泛应用，模型对不同群体身份展现共情并理解其视角的能力日益受到重视。现有的大语言模型共情评估基准主要关注普遍人类情感（如悲伤、痛苦），往往忽视个体群体身份的背景。为填补这一空白，我们提出了GIEBench——一个包含11个身份维度、涵盖97种群体身份、共计999道与特定群体身份相关的单选题的综合评估基准。GIEBench旨在评估大语言模型在面对性别、年龄、职业、种族等特定群体身份时的共情能力，强调其从指定群体立场进行回应的能力。这有助于推动针对不同身份用户定制化共情大语言模型应用的持续发展。我们对23个大语言模型的评估表明：尽管这些模型能够理解不同身份立场，但在未明确指示采用特定视角时，它们无法持续对不同身份展现同等程度的共情。这凸显了改进大语言模型与多元价值观对齐的必要性，以更好地适应人类身份的多维特性。我们的数据集发布于 https://github.com/GIEBench/GIEBench。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日