While Large Language Models (LLM) have created a massive technological impact in the past decade, allowing for human-enabled applications, they can produce output that contains stereotypes and biases, especially when using low-resource languages. This can be of great ethical concern when dealing with sensitive topics such as religion. As a means toward making LLMS more fair, we explore bias from a religious perspective in Bengali, focusing specifically on two main religious dialects: Hindu and Muslim-majority dialects. Here, we perform different experiments and audit showing the comparative analysis of different sentences using three commonly used LLMs: ChatGPT, Gemini, and Microsoft Copilot, pertaining to the Hindu and Muslim dialects of specific words and showcasing which ones catch the social biases and which do not. Furthermore, we analyze our findings and relate them to potential reasons and evaluation perspectives, considering their global impact with over 300 million speakers worldwide. With this work, we hope to establish the rigor for creating more fairness in LLMs, as these are widely used as creative writing agents.
翻译:尽管大语言模型(LLM)在过去十年产生了巨大的技术影响,并催生了诸多人类赋能应用,但它们在处理低资源语言时可能产生包含刻板印象和偏见的输出。这在涉及宗教等敏感话题时可能引发严重的伦理关切。为促进LLM的公平性,我们从宗教视角出发探究孟加拉语中的偏见,特别聚焦于两大主要宗教方言:印度教与穆斯林占多数的方言。本文通过设计不同实验与审计,使用三种常用LLM(ChatGPT、Gemini和Microsoft Copilot)对涉及特定词汇的印度教与穆斯林方言语句进行对比分析,揭示哪些模型捕捉到社会偏见而哪些未能识别。进一步地,我们结合全球超过三亿使用者的背景,深入解析研究发现,关联其潜在成因与评估视角,探讨其全球性影响。通过本研究,我们期望为提升LLM的公平性建立严谨基准,因为这些模型正被广泛用作创意写作工具。