CulturePark: Boosting Cross-cultural Understanding in Large Language Models

Cultural bias is pervasive in many large language models (LLMs), largely due to the deficiency of data representative of different cultures. Typically, cultural datasets and benchmarks are constructed either by extracting subsets of existing datasets or by aggregating from platforms such as Wikipedia and social media. However, these approaches are highly dependent on real-world data and human annotations, making them costly and difficult to scale. Inspired by cognitive theories on social communication, this paper introduces CulturePark, an LLM-powered multi-agent communication framework for cultural data collection. CulturePark simulates cross-cultural human communication with LLM-based agents playing roles in different cultures. It generates high-quality cross-cultural dialogues encapsulating human beliefs, norms, and customs. Using CulturePark, we generated 41,000 cultural samples to fine-tune eight culture-specific LLMs. We evaluated these models across three downstream tasks: content moderation, cultural alignment, and cultural education. Results show that for content moderation, our GPT-3.5-based models either match or outperform GPT-4 on datasets. Regarding cultural alignment, our models surpass GPT-4 on Hofstede's VSM 13 framework. Furthermore, for cultural education of human participants, our models demonstrate superior outcomes in both learning efficacy and user experience compared to GPT-4. CulturePark proves an important step in addressing cultural bias and advancing the democratization of AI, highlighting the critical role of culturally inclusive data in model training.

翻译：文化偏见在许多大型语言模型（LLMs）中普遍存在，这主要源于缺乏代表不同文化的数据。通常，文化数据集和基准测试的构建方式有两种：从现有数据集中提取子集，或从维基百科和社交媒体等平台进行聚合。然而，这些方法高度依赖现实世界的数据和人工标注，导致成本高昂且难以扩展。受社会沟通认知理论的启发，本文提出了CulturePark——一个基于LLM驱动的多智能体沟通框架，用于文化数据收集。CulturePark通过让基于LLM的智能体扮演不同文化背景的角色，模拟跨文化的人类交流。该框架能够生成蕴含人类信仰、规范与习俗的高质量跨文化对话。利用CulturePark，我们生成了41,000个文化样本，用于微调八个特定文化的大型语言模型。我们在三个下游任务中评估了这些模型：内容审核、文化对齐和文化教育。结果显示，在内容审核任务中，我们基于GPT-3.5的模型在多个数据集上达到或超越了GPT-4的表现。在文化对齐方面，我们的模型在霍夫斯泰德VSM 13框架上的表现优于GPT-4。此外，在针对人类参与者的文化教育任务中，我们的模型在知识习得效果和用户体验方面均展现出优于GPT-4的结果。CulturePark为解决文化偏见、推进人工智能民主化迈出了重要一步，凸显了文化包容性数据在模型训练中的关键作用。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日