With the increasing adoption of large language models (LLMs), ensuring their alignment with social norms has become a critical concern. While prior research has examined bias detection in various languages, there remains a significant gap in resources addressing social biases within Persian cultural contexts. In this work, we introduce PBBQ, a comprehensive benchmark dataset designed to evaluate social biases in Persian LLMs. Our benchmark, which encompasses 16 cultural categories, was developed through questionnaires completed by 250 diverse individuals across multiple demographics, in close collaboration with social science experts to ensure its validity. The resulting PBBQ dataset contains over 37,000 carefully curated questions, providing a foundation for the evaluation and mitigation of bias in Persian language models. We benchmark several open-source LLMs, a closed-source model, and Persian-specific fine-tuned models on PBBQ. Our findings reveal that current LLMs exhibit significant social biases across Persian culture. Additionally, by comparing model outputs to human responses, we observe that LLMs often replicate human bias patterns, highlighting the complex interplay between learned representations and cultural stereotypes.Upon acceptance of the paper, our PBBQ dataset will be publicly available for use in future work. Content warning: This paper contains unsafe content.
翻译:随着大型语言模型(LLMs)的日益普及,确保其与社会规范的一致性已成为关键问题。尽管先前研究已考察了多种语言的偏见检测,但在针对波斯文化语境中社会偏见的资源方面仍存在显著空白。本研究提出了PBBQ——一个用于评估波斯语LLMs社会偏见的综合性基准数据集。该基准涵盖16个文化类别,通过250名来自不同人口统计学背景的个体填写的问卷开发而成,并与社会科学专家密切合作以确保其有效性。最终构建的PBBQ数据集包含超过37,000个精心设计的问题,为波斯语语言模型的偏见评估与缓解提供了基础。我们在PBBQ上测试了多个开源LLMs、一个闭源模型以及针对波斯语优化的微调模型。研究结果表明,当前LLMs在波斯文化语境中表现出显著的社会偏见。此外,通过对比模型输出与人类回答,我们发现LLMs经常复现人类偏见模式,这揭示了学习表征与文化刻板印象之间复杂的相互作用。论文录用后,我们的PBBQ数据集将公开供后续研究使用。内容警示:本文包含敏感内容。