The rapid growth of Large Language Models (LLMs) has put forward the study of biases as a crucial field. It is important to assess the influence of different types of biases embedded in LLMs to ensure fair use in sensitive fields. Although there have been extensive works on bias assessment in English, such efforts are rare and scarce for a major language like Bangla. In this work, we examine two types of social biases in LLM generated outputs for Bangla language. Our main contributions in this work are: (1) bias studies on two different social biases for Bangla, (2) a curated dataset for bias measurement benchmarking and (3) testing two different probing techniques for bias detection in the context of Bangla. This is the first work of such kind involving bias assessment of LLMs for Bangla to the best of our knowledge. All our code and resources are publicly available for the progress of bias related research in Bangla NLP.
翻译:大型语言模型(LLMs)的快速发展使得偏见研究成为一个关键领域。评估LLMs中嵌入的各类偏见的影响,对于确保其在敏感领域的公平使用至关重要。尽管针对英语的偏见评估已有大量研究,但对于孟加拉语这类主要语言,此类工作仍十分匮乏。本研究考察了LLMs生成孟加拉语文本时存在的两类社会偏见。本工作的主要贡献包括:(1)针对孟加拉语的两种社会偏见开展研究;(2)构建用于偏见测量基准测试的精选数据集;(3)在孟加拉语语境下测试两种不同的偏见探测技术。据我们所知,这是首个针对孟加拉语LLMs进行偏见评估的研究。我们所有的代码与资源均已公开,以促进孟加拉语自然语言处理领域的偏见相关研究。