This paper identifies a cultural dominance issue within large language models (LLMs) due to the predominant use of English data in model training (e.g., ChatGPT). LLMs often provide inappropriate English-culture-related answers that are not relevant to the expected culture when users ask in non-English languages. To systematically evaluate the cultural dominance issue, we build a benchmark of concrete (e.g., holidays and songs) and abstract (e.g., values and opinions) cultural objects. Empirical results show that the representative GPT models suffer from the culture dominance problem, where GPT-4 is the most affected while text-davinci-003 suffers the least from this problem. Our study emphasizes the need to critically examine cultural dominance and ethical consideration in their development and deployment. We show that two straightforward methods in model development (i.e., pretraining on more diverse data) and deployment (e.g., culture-aware prompting) can significantly mitigate the cultural dominance issue in LLMs.
翻译:本文识别了大型语言模型(LLMs)中存在的一个文化主导问题,该问题源于模型训练(如ChatGPT)中英语数据的过度使用。当用户以非英语语言提问时,LLMs常常提供与预期文化无关的、不恰当的英语文化相关答案。为系统评估文化主导问题,我们构建了一个包含具体(如节日和歌曲)和抽象(如价值观和观点)文化对象的基准数据集。实证结果表明,具有代表性的GPT模型存在文化主导问题,其中GPT-4受影响最严重,而text-davinci-003受此问题影响最小。我们的研究强调,在其开发和部署过程中需严格审视文化主导问题及伦理考量。我们证明,模型开发中的两种直接方法(即采用更多样化数据进行预训练)和部署方法(如文化感知提示)能够显著缓解LLMs中的文化主导问题。