As Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks, concerns regarding the potential negative societal impacts of LLM-generated content have also arisen. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. However, existing bias evaluation efforts often focus on only a particular type of bias and employ inconsistent evaluation metrics, leading to difficulties in comparison across different datasets and LLMs. To address these limitations, we collect a variety of datasets designed for the bias evaluation of LLMs, and further propose CEB, a Compositional Evaluation Benchmark that covers different types of bias across different social groups and tasks. The curation of CEB is based on our newly proposed compositional taxonomy, which characterizes each dataset from three dimensions: bias types, social groups, and tasks. By combining the three dimensions, we develop a comprehensive evaluation strategy for the bias in LLMs. Our experiments demonstrate that the levels of bias vary across these dimensions, thereby providing guidance for the development of specific bias mitigation methods.
翻译:随着大型语言模型(LLMs)越来越多地被部署来处理各种自然语言处理(NLP)任务,关于LLM生成内容可能产生的负面社会影响的担忧也随之出现。为了评估LLMs表现出的偏见,研究人员最近提出了多种数据集。然而,现有的偏见评估工作通常只关注特定类型的偏见,并且采用不一致的评估指标,导致难以在不同数据集和LLMs之间进行比较。为了解决这些局限性,我们收集了多种为LLMs偏见评估设计的数据集,并进一步提出了CEB,一个涵盖不同社会群体和任务中不同类型偏见的组合式评估基准。CEB的构建基于我们新提出的组合式分类法,该分类法从三个维度描述每个数据集:偏见类型、社会群体和任务。通过结合这三个维度,我们为LLMs中的偏见制定了一个全面的评估策略。我们的实验表明,偏见水平在这些维度上存在差异,从而为开发具体的偏见缓解方法提供了指导。