The rapid advancement of large language models has revolutionized various applications but also raised crucial concerns about their potential to perpetuate biases and unfairness when deployed in social media contexts. Evaluating LLMs' potential biases and fairness has become crucial, as existing methods rely on limited prompts focusing on just a few groups, lacking a comprehensive categorical perspective. In this paper, we propose evaluating LLM biases from a group fairness lens using a novel hierarchical schema characterizing diverse social groups. Specifically, we construct a dataset, GFair, encapsulating target-attribute combinations across multiple dimensions. In addition, we introduce statement organization, a new open-ended text generation task, to uncover complex biases in LLMs. Extensive evaluations of popular LLMs reveal inherent safety concerns. To mitigate the biases of LLM from a group fairness perspective, we pioneer a novel chain-of-thought method GF-Think to mitigate biases of LLMs from a group fairness perspective. Experimental results demonstrate its efficacy in mitigating bias in LLMs to achieve fairness.
翻译:大型语言模型的快速发展彻底改变了各类应用,但也引发了关键的担忧,即当它们被部署在社交媒体场景中时,可能延续偏见和不公平。评估大型语言模型的潜在偏见和公平性变得至关重要,因为现有方法依赖于仅针对少数群体的有限提示,缺乏全面的分类视角。在本文中,我们提出通过一种新颖的分层模式来从群体公平性视角评估大型语言模型的偏见,该模式描述了多样化的社会群体。具体而言,我们构建了一个数据集GFair,它涵盖了跨多个维度的目标-属性组合。此外,我们引入了语句组织这一新的开放式文本生成任务,以揭示大型语言模型中的复杂偏见。对主流大型语言模型的广泛评估揭示了固有的安全性问题。为了从群体公平性角度减轻大型语言模型的偏见,我们率先提出了一种新的思维链方法GF-Think,该方法从群体公平性视角减轻大型语言模型的偏见。实验结果证明了其在缓解大型语言模型偏见以实现公平性方面的有效性。