Social biases inherent in large language models (LLMs) raise significant fairness concerns. Retrieval-Augmented Generation (RAG) architectures, which retrieve external knowledge sources to enhance the generative capabilities of LLMs, remain susceptible to the same bias-related challenges. This work focuses on evaluating and understanding the social bias implications of RAG. Through extensive experiments across various retrieval corpora, LLMs, and bias evaluation datasets, encompassing more than 13 different bias types, we surprisingly observe a reduction in bias in RAG. This suggests that the inclusion of external context can help counteract stereotype-driven predictions, potentially improving fairness by diversifying the contextual grounding of the model's outputs. To better understand this phenomenon, we then explore the model's reasoning process by integrating Chain-of-Thought (CoT) prompting into RAG while assessing the faithfulness of the model's CoT. Our experiments reveal that the model's bias inclinations shift between stereotype and anti-stereotype responses as more contextual information is incorporated from the retrieved documents. Interestingly, we find that while CoT enhances accuracy, contrary to the bias reduction observed with RAG, it increases overall bias across datasets, highlighting the need for bias-aware reasoning frameworks that can mitigate this trade-off.
翻译:大型语言模型(LLMs)中固有的社会偏见引发了严重的公平性问题。检索增强生成(RAG)架构通过检索外部知识源来增强LLMs的生成能力,但仍面临相同的偏见相关挑战。本研究重点评估并理解RAG的社会偏见影响。通过对多种检索语料库、LLMs和偏见评估数据集(涵盖超过13种不同偏见类型)进行大量实验,我们意外地观察到RAG中的偏见有所减少。这表明引入外部语境有助于抵消基于刻板印象的预测,可能通过多样化模型输出的语境基础来提升公平性。为深入理解这一现象,我们随后通过将思维链(CoT)提示集成到RAG中,并评估模型CoT的忠实度,来探索模型的推理过程。实验表明,随着从检索文档中融入更多语境信息,模型的偏见倾向会在刻板印象与反刻板印象响应之间转变。有趣的是,我们发现尽管CoT提高了准确性,但与RAG观察到的偏见减少相反,它增加了跨数据集的总体偏见,这凸显了需要能够缓解这种权衡的、具有偏见感知能力的推理框架。