In this research, we introduce BEATS, a novel framework for evaluating Bias, Ethics, Fairness, and Factuality in Large Language Models (LLMs). Building upon the BEATS framework, we present a bias benchmark for LLMs that measure performance across 29 distinct metrics. These metrics span a broad range of characteristics, including demographic, cognitive, and social biases, as well as measures of ethical reasoning, group fairness, and factuality related misinformation risk. These metrics enable a quantitative assessment of the extent to which LLM generated responses may perpetuate societal prejudices that reinforce or expand systemic inequities. To achieve a high score on this benchmark a LLM must show very equitable behavior in their responses, making it a rigorous standard for responsible AI evaluation. Empirical results based on data from our experiment show that, 37.65\% of outputs generated by industry leading models contained some form of bias, highlighting a substantial risk of using these models in critical decision making systems. BEATS framework and benchmark offer a scalable and statistically rigorous methodology to benchmark LLMs, diagnose factors driving biases, and develop mitigation strategies. With the BEATS framework, our goal is to help the development of more socially responsible and ethically aligned AI models.
翻译:本研究提出BEATS,一个用于评估大语言模型偏见、伦理、公平性与事实性的创新框架。基于BEATS框架,我们构建了一个涵盖29项差异化指标的大语言模型偏见基准测试体系。这些指标覆盖广泛特性,包括人口统计学偏见、认知偏见与社会偏见,以及伦理推理能力、群体公平性和涉及虚假信息风险的事实性度量。通过该指标体系,可量化评估大语言模型生成响应在多大程度上延续可能强化或扩大系统性不平等的社会偏见。要在本基准测试中获得高分,大语言模型必须在响应中展现高度公平的行为,这使其成为负责任人工智能评估的严格标准。基于实验数据的实证结果表明,行业领先模型生成的输出中有37.65%存在某种形式的偏见,凸显了在关键决策系统中使用这些模型的重大风险。BEATS框架与基准测试提供了一种可扩展且统计严谨的方法论,可用于大语言模型基准测试、偏见驱动因素诊断以及缓解策略开发。通过BEATS框架,我们的目标是助力开发更具社会责任感且符合伦理规范的人工智能模型。