Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale. Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. Using CulturePark, we generated 41,000 cultural samples to fine-tune eight culture-specific LLMs. We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education. Results show that for content moderation, our GPT-3.5-based models either match or outperform GPT-4 on datasets. Regarding cultural alignment, our models surpass GPT-4 on Hofstede's VSM 13 framework. Furthermore, for cultural education of human participants, our models demonstrate superior outcomes in both learning efficacy and user experience compared to GPT-4. CulturePark proves an important step in addressing cultural bias and advancing the democratization of AI, highlighting the critical role of culturally inclusive data in model training.
翻译:文化偏见在许多大型语言模型(LLMs)中普遍存在,这主要源于缺乏代表不同文化的数据。通常,文化数据集和基准测试的构建方式有两种:从现有数据集中提取子集,或从维基百科和社交媒体等平台进行聚合。然而,这些方法高度依赖现实世界的数据和人工标注,导致成本高昂且难以扩展。受社会沟通认知理论的启发,本文提出了CulturePark——一个基于LLM驱动的多智能体沟通框架,用于文化数据收集。CulturePark通过让基于LLM的智能体扮演不同文化背景的角色,模拟跨文化的人类交流。该框架能够生成蕴含人类信仰、规范与习俗的高质量跨文化对话。利用CulturePark,我们生成了41,000个文化样本,用于微调八个特定文化的大型语言模型。我们在三个下游任务中评估了这些模型:内容审核、文化对齐和文化教育。结果显示,在内容审核任务中,我们基于GPT-3.5的模型在多个数据集上达到或超越了GPT-4的表现。在文化对齐方面,我们的模型在霍夫斯泰德VSM 13框架上的表现优于GPT-4。此外,在针对人类参与者的文化教育任务中,我们的模型在知识习得效果和用户体验方面均展现出优于GPT-4的结果。CulturePark为解决文化偏见、推进人工智能民主化迈出了重要一步,凸显了文化包容性数据在模型训练中的关键作用。