In this paper, we identify a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g. ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark that consists of both concrete (e.g. holidays and songs) and abstract (e.g. values and opinions) cultural objects. Empirical results show that the representative GPT models suffer from the culture dominance problem, where GPT-4 is the most affected while text-davinci-003 suffers the least from this problem. Our study emphasizes the need for critical examination of cultural dominance and ethical consideration in their development and deployment. We show two straightforward methods in model development (i.e. pretraining on more diverse data) and deployment (e.g. culture-aware prompting) can significantly mitigate the cultural dominance issue in LLMs.
翻译:本文揭示了大语言模型(LLMs)因训练数据主要采用英语语料(如ChatGPT)而存在的文化主导性问题。当用户以非英语语言提问时,LLMs常输出与预期文化背景不符、却与英语文化相关的不恰当回答。为系统评估文化主导性问题,我们构建了一个包含具体文化对象(如节日和歌曲)与抽象文化对象(如价值观和观点)的基准测试集。实验结果表明,代表性GPT模型均存在文化主导性问题,其中GPT-4受影响最严重,而text-davinci-003受影响最小。本研究强调在模型开发与部署中需对文化主导性进行批判性审视和伦理考量。我们展示了两类直接有效的缓解方法:模型开发层面(如采用更多元数据预训练)与部署层面(如文化感知提示),可显著减轻LLMs中的文化主导性问题。