GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models

Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but they have also been observed to magnify societal biases, particularly those related to gender. In response to this issue, several benchmarks have been proposed to assess gender bias in LLMs. However, these benchmarks often lack practical flexibility or inadvertently introduce biases. To address these shortcomings, we introduce GenderCARE, a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics for quantifying and mitigating gender bias in LLMs. To begin, we establish pioneering criteria for gender equality benchmarks, spanning dimensions such as inclusivity, diversity, explainability, objectivity, robustness, and realisticity. Guided by these criteria, we construct GenderPair, a novel pair-based benchmark designed to assess gender bias in LLMs comprehensively. Our benchmark provides standardized and realistic evaluations, including previously overlooked gender groups such as transgender and non-binary individuals. Furthermore, we develop effective debiasing techniques that incorporate counterfactual data augmentation and specialized fine-tuning strategies to reduce gender bias in LLMs without compromising their overall performance. Extensive experiments demonstrate a significant reduction in various gender bias benchmarks, with reductions peaking at over 90% and averaging above 35% across 17 different LLMs. Importantly, these reductions come with minimal variability in mainstream language tasks, remaining below 2%. By offering a realistic assessment and tailored reduction of gender biases, we hope that our GenderCARE can represent a significant step towards achieving fairness and equity in LLMs. More details are available at https://github.com/kstanghere/GenderCARE-ccs24.

翻译：大语言模型（LLMs）在自然语言生成方面展现出卓越能力，但同时也被观察到会放大社会偏见，尤其是与性别相关的偏见。针对这一问题，已有若干基准被提出用于评估LLMs中的性别偏见。然而，这些基准往往缺乏实际灵活性，或无意中引入了新的偏见。为弥补这些不足，我们提出了GenderCARE，一个综合性的框架，包含用于量化和减轻LLMs中性别偏见的创新性准则、偏见评估、减轻技术和评估指标。首先，我们为性别平等基准建立了开创性的准则，涵盖包容性、多样性、可解释性、客观性、鲁棒性和现实性等多个维度。在这些准则的指导下，我们构建了GenderPair，一个新颖的基于配对的基准，旨在全面评估LLMs中的性别偏见。我们的基准提供了标准化且现实的评估，包括以往被忽视的性别群体，如跨性别者和非二元性别者。此外，我们开发了有效的去偏见技术，结合反事实数据增强和专门的微调策略，以减少LLMs中的性别偏见，同时不损害其整体性能。大量实验表明，在多种性别偏见基准上，偏见显著减少，减少幅度最高超过90%，在17种不同的LLMs上平均减少超过35%。重要的是，这些减少伴随着主流语言任务中极小的性能波动，保持在2%以下。通过提供对性别偏见的现实评估和针对性减轻，我们希望我们的GenderCARE能够代表在实现LLMs公平与公正方面迈出的重要一步。更多细节请访问 https://github.com/kstanghere/GenderCARE-ccs24。