Recent improvements in large language models (LLMs) have significantly enhanced natural language processing (NLP) applications. However, these models can also inherit and perpetuate biases from their training data. Addressing this issue is crucial, yet many existing datasets do not offer evaluation across diverse NLP tasks. To tackle this, we introduce the Bias Evaluations Across Domains (BEADs) dataset, designed to support a wide range of NLP tasks, including text classification, bias entity recognition, bias quantification, and benign language generation. BEADs uses AI-driven annotation combined with experts' verification to provide reliable labels. This method overcomes the limitations of existing datasets that typically depend on crowd-sourcing, expert-only annotations with limited bias evaluations, or unverified AI labeling. Our empirical analysis shows that BEADs is effective in detecting and reducing biases across different language models, with smaller models fine-tuned on BEADs often outperforming LLMs in bias classification tasks. However, these models may still exhibit biases towards certain demographics. Fine-tuning LLMs with our benign language data also reduces biases while preserving the models' knowledge. Our findings highlight the importance of comprehensive bias evaluation and the potential of targeted fine-tuning for reducing the bias of LLMs. We are making BEADs publicly available at https://huggingface.co/datasets/shainar/BEAD Warning: This paper contains examples that may be considered offensive.
翻译:近年来,大型语言模型(LLM)的进步显著提升了自然语言处理(NLP)应用的能力。然而,这些模型也可能从训练数据中继承并延续偏见。解决这一问题至关重要,但现有数据集大多未能在多样化的NLP任务中提供评估。为此,我们提出了跨领域偏见评估(BEADs)数据集,旨在支持广泛的NLP任务,包括文本分类、偏见实体识别、偏见量化及无害语言生成。BEADs采用人工智能驱动的标注结合专家验证的方式,提供可靠的标签。该方法克服了现有数据集的局限性——这些数据集通常依赖众包、仅由专家进行有限偏见评估的标注,或未经验证的AI标注。我们的实证分析表明,BEADs能有效检测并减少不同语言模型中的偏见,其中在BEADs上微调的小型模型在偏见分类任务中往往优于大型语言模型(LLM)。然而,这些模型仍可能对某些群体表现出偏见。使用我们的无害语言数据对LLM进行微调,能在保持模型知识的同时减少偏见。我们的研究结果凸显了全面偏见评估的重要性,以及针对性微调在减少LLM偏见方面的潜力。我们已在https://huggingface.co/datasets/shainar/BEAD上公开发布BEADs数据集。警告:本文包含可能被视为冒犯性的示例。