Existing approaches to bias evaluation in large language models (LLMs) trade ecological validity for statistical control, relying either on artificial prompts that poorly reflect real-world use or on naturalistic tasks that lack scale and rigor. We introduce a scalable bias-auditing framework that uses named entities as controlled probes to measure systematic disparities in model behavior. Synthetic data enables us to construct diverse, controlled inputs, and we show that it reliably reproduces bias patterns observed in natural text, supporting its use for large-scale analysis. Using this framework, we conduct the largest bias audit to date, comprising 1.9 billion data points across multiple entity types, tasks, languages, models, and prompting strategies. We find consistent patterns: models penalize right-wing politicians and favor left-wing politicians, prefer Western and wealthier countries over the Global South, favor Western companies, and penalize firms in the defense and pharmaceutical sectors. While instruction tuning reduces bias, increasing model scale amplifies it, and prompting in Chinese or Russian does not mitigate Western-aligned preferences. These findings highlight the need for systematic bias auditing before deploying LLMs in high-stakes applications. Our framework is extensible to other domains and tasks, and we make it publicly available to support future work.
翻译:现有的大型语言模型(LLM)偏见评估方法在生态效度与统计控制之间进行权衡,要么依赖难以反映真实使用场景的人工提示,要么采用缺乏规模性和严谨性的自然任务。我们提出一个可扩展的偏见审计框架,利用命名实体作为受控探针来测量模型行为中的系统差异。通过合成数据,我们能够构建多样化的受控输入,并证明其能可靠地复现自然文本中观察到的偏见模式,从而支撑其在大规模分析中的应用。基于该框架,我们进行了迄今为止规模最大的偏见审计,覆盖19亿个数据点,涉及多种实体类型、任务、语言、模型及提示策略。我们发现一致模式:模型倾向于惩罚右翼政治家而偏向左翼政治家,偏好西方和较富裕国家而非全球南方,青睐西方公司,并惩罚国防和制药行业的企业。虽然指令微调能减少偏见,但模型规模增大会放大偏见,且使用中文或俄语提示无法减轻西方偏好倾向。这些发现凸显了在高压场景中部署LLM前需进行系统性偏见审计的必要性。我们的框架可扩展至其他领域和任务,并已公开提供以支持未来研究。