As the reach of large language models (LMs) expands globally, their ability to cater to diverse cultural contexts becomes crucial. Despite advancements in multilingual capabilities, models are not designed with appropriate cultural nuances. In this paper, we show that multilingual and Arabic monolingual LMs exhibit bias towards entities associated with Western culture. We introduce CAMeL, a novel resource of 628 naturally-occurring prompts and 20,368 entities spanning eight types that contrast Arab and Western cultures. CAMeL provides a foundation for measuring cultural biases in LMs through both extrinsic and intrinsic evaluations. Using CAMeL, we examine the cross-cultural performance in Arabic of 16 different LMs on tasks such as story generation, NER, and sentiment analysis, where we find concerning cases of stereotyping and cultural unfairness. We further test their text-infilling performance, revealing the incapability of appropriate adaptation to Arab cultural contexts. Finally, we analyze 6 Arabic pre-training corpora and find that commonly used sources such as Wikipedia may not be best suited to build culturally aware LMs, if used as they are without adjustment. We will make CAMeL publicly available at: https://github.com/tareknaous/camel
翻译:随着大型语言模型(LMs)在全球范围内的普及,其适应多元文化语境的能力变得至关重要。尽管多语言能力有所提升,但现有模型在设计时并未充分融入恰当的文化细微差异。本文表明,多语言和阿拉伯语单语言LMs对与西方文化相关的实体表现出偏见。我们提出了CAMeL,这是一个包含628个自然生成提示词和20,368个实体的新型资源,涵盖八种类型,对比了阿拉伯文化与西方文化。CAMeL通过外在评估和内在评估为衡量LMs中的文化偏见奠定了基础。利用CAMeL,我们考察了16种不同LMs在阿拉伯语跨文化任务(如故事生成、命名实体识别和情感分析)中的表现,发现了令人担忧的刻板印象和文化不公平现象。我们进一步测试了它们的文本填充性能,揭示了其在阿拉伯文化语境中自适应能力不足的问题。最后,我们分析了6个阿拉伯语预训练语料库,发现维基百科等常用来源若未经调整直接使用,可能并不适合构建具有文化意识的LMs。我们将公开CAMeL资源,网址为:https://github.com/tareknaous/camel