Socio-demographic prompting is a commonly employed approach to study cultural biases in LLMs as well as for aligning models to certain cultures. In this paper, we systematically probe four LLMs (Llama 3, Mistral v0.2, GPT-3.5 Turbo and GPT-4) with prompts that are conditioned on culturally sensitive and non-sensitive cues, on datasets that are supposed to be culturally sensitive (EtiCor and CALI) or neutral (MMLU and ETHICS). We observe that all models except GPT-4 show significant variations in their responses on both kinds of datasets for both kinds of prompts, casting doubt on the robustness of the culturally-conditioned prompting as a method for eliciting cultural bias in models or as an alignment strategy. The work also calls rethinking the control experiment design to tease apart the cultural conditioning of responses from "placebo effect", i.e., random perturbations of model responses due to arbitrary tokens in the prompt.
翻译:社会人口统计学提示是一种常用方法,既用于研究大型语言模型(LLMs)中的文化偏见,也用于将模型与特定文化对齐。本文中,我们系统性地探究了四种LLM(Llama 3、Mistral v0.2、GPT-3.5 Turbo和GPT-4),使用基于文化敏感线索与非敏感线索的提示,在预期为文化敏感的数据集(EtiCor和CALI)或中性数据集(MMLU和ETHICS)上进行测试。我们观察到,除GPT-4外,所有模型在两类数据集上对两类提示的响应均表现出显著变化,这引发了对文化条件化提示作为一种激发模型文化偏见或作为对齐策略的方法的稳健性的质疑。此项工作也呼吁重新思考对照实验的设计,以区分响应的文化条件作用与“安慰剂效应”——即提示中任意标记导致的模型响应的随机扰动。