Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.
翻译:大型语言模型(LLMs)正日益通过适应调整以实现任务特异性,从而部署于现实世界的决策系统中。先前多项研究通过考察微调适应策略对模型公平性的影响,探讨了偏见传递假说(BTH),发现预训练掩码语言模型的公平性对经微调适应后的模型公平性影响有限。本研究将BTH的探讨扩展至因果模型在提示适应下的表现,因为提示是一种在实际系统中部署模型的可访问且计算高效的方法。与先前研究不同,我们通过代词共指消解任务证实,预训练的Mistral、Falcon和Llama模型中的内在偏见与这些模型在零样本和少样本提示下的偏见呈现强相关性(rho ≥ 0.94)。此外,我们发现即使当LLMs被特定提示要求表现出公平或偏见行为时(rho ≥ 0.92),以及当少样本提示长度和刻板印象构成发生变化时(rho ≥ 0.97),偏见传递仍保持强相关性。我们的研究结果凸显了确保预训练LLMs公平性的重要性,尤其当它们后续通过提示适应执行下游任务时。