Socio-demographic prompting is a commonly employed approach to study cultural biases in LLMs as well as for aligning models to certain cultures. In this paper, we systematically probe four LLMs (Llama 3, Mistral v0.2, GPT-3.5 Turbo and GPT-4) with prompts that are conditioned on culturally sensitive and non-sensitive cues, on datasets that are supposed to be culturally sensitive (EtiCor and CALI) or neutral (MMLU and ETHICS). We observe that all models except GPT-4 show significant variations in their responses on both kinds of datasets for both kinds of prompts, casting doubt on the robustness of the culturally-conditioned prompting as a method for eliciting cultural bias in models or as an alignment strategy. The work also calls rethinking the control experiment design to tease apart the cultural conditioning of responses from "placebo effect", i.e., random perturbations of model responses due to arbitrary tokens in the prompt.
翻译:社会人口统计学提示是一种常用方法,用于研究大型语言模型中的文化偏见,以及将模型与特定文化对齐。本文系统性地探究了四种大型语言模型(Llama 3、Mistral v0.2、GPT-3.5 Turbo 和 GPT-4),使用基于文化敏感与非敏感线索的提示,在预设为文化敏感的数据集(EtiCor 和 CALI)或中性数据集(MMLU 和 ETHICS)上进行测试。我们观察到,除 GPT-4 外,所有模型在两类数据集上对两类提示的响应均表现出显著波动,这使人们对文化条件化提示作为激发模型文化偏见的方法或作为对齐策略的稳健性产生质疑。本研究同时呼吁重新思考对照实验设计,以区分响应中的文化条件反射与“安慰剂效应”——即提示中任意标记导致的模型响应随机扰动。