Large language models (LLMs) are reported to be partial to certain cultures owing to the training data dominance from the English corpora. Since multilingual cultural data are often expensive to collect, existing efforts handle this by prompt engineering or culture-specific pre-training. However, they might overlook the knowledge deficiency of low-resource culture and require extensive computing resources. In this paper, we propose CultureLLM, a cost-effective solution to incorporate cultural differences into LLMs. CultureLLM adopts World Value Survey (WVS) as seed data and generates semantically equivalent training data via the proposed semantic data augmentation. Using only 50 seed samples from WVS with augmented data, we fine-tune culture-specific LLMs and one unified model (CultureLLM-One) for 9 cultures covering rich and low-resource languages. Extensive experiments on 60 culture-related datasets demonstrate that CultureLLM significantly outperforms various counterparts such as GPT-3.5 (by 8.1%) and Gemini Pro (by 9.5%) with comparable performance to GPT-4 or even better. Our human study shows that the generated samples are semantically equivalent to the original samples, providing an effective solution for LLMs augmentation. Code is released at https://github.com/Scarelette/CultureLLM.
翻译:据报道,由于训练数据主要来自英语语料库,大语言模型(LLMs)往往对某些文化存在偏向性。由于多语言文化数据的收集成本通常较高,现有研究主要通过提示工程或针对特定文化的预训练来处理此问题。然而,这些方法可能忽视低资源文化的知识缺失,并且需要大量的计算资源。本文提出CultureLLM,一种将文化差异融入LLMs的经济高效解决方案。CultureLLM采用世界价值观调查(WVS)作为种子数据,并通过提出的语义数据增强方法生成语义等效的训练数据。仅使用来自WVS的50个种子样本及其增强数据,我们为涵盖丰富资源和低资源语言的9种文化微调了特定文化LLMs和一个统一模型(CultureLLM-One)。在60个文化相关数据集上的大量实验表明,CultureLLM显著优于GPT-3.5(提升8.1%)和Gemini Pro(提升9.5%)等多种对比模型,其性能与GPT-4相当甚至更优。我们的人工评估表明,生成的样本与原始样本语义等效,为LLMs增强提供了有效解决方案。代码发布于 https://github.com/Scarelette/CultureLLM。