Large Language Models (LLMs), now used daily by millions of users, can encode societal biases, exposing their users to representational harms. A large body of scholarship on LLM bias exists but it predominantly adopts a Western-centric frame and attends comparatively less to bias levels and potential harms in the Global South. In this paper, we quantify stereotypical bias in popular LLMs according to an Indian-centric frame and compare bias levels between the Indian and Western contexts. To do this, we develop a novel dataset which we call Indian-BhED (Indian Bias Evaluation Dataset), containing stereotypical and anti-stereotypical examples for caste and religion contexts. We find that the majority of LLMs tested are strongly biased towards stereotypes in the Indian context, especially as compared to the Western context. We finally investigate Instruction Prompting as a simple intervention to mitigate such bias and find that it significantly reduces both stereotypical and anti-stereotypical biases in the majority of cases for GPT-3.5. The findings of this work highlight the need for including more diverse voices when evaluating LLMs.
翻译:大语言模型(LLMs)如今被数百万用户日常使用,它们可能编码社会偏见,从而对用户造成表征性伤害。现有大量关于LLM偏见的研究,但主要采用西方中心框架,对全球南方地区的偏见程度及其潜在危害关注相对不足。本文基于印度中心框架,量化了主流LLM中的刻板印象偏见,并比较了印度与西方背景下的偏见程度。为此,我们开发了一个名为Indian-BhED(印度偏见评估数据集)的新数据集,其中包含涉及种姓和宗教背景的刻板印象及反刻板印象示例。研究发现,大多数被测试的LLM在印度背景下表现出强烈的刻板印象偏见,尤其是在与西方背景对比时更为显著。最后,我们探究了指令提示作为减轻此类偏见的简单干预手段,并发现该方法在GPT-3.5的大多数案例中显著减少了刻板印象偏见和反刻板印象偏见。本研究结果强调了在评估LLM时需要纳入更多元化的声音。