Large language models (LLMs), despite their remarkable capabilities, are susceptible to generating biased and discriminatory responses. As LLMs increasingly influence high-stakes decision-making (e.g., hiring and healthcare), mitigating these biases becomes critical. In this work, we propose a causality-guided debiasing framework to tackle social biases, aiming to reduce the objectionable dependence between LLMs' decisions and the social information in the input. Our framework introduces a novel perspective to identify how social information can affect an LLM's decision through different causal pathways. Leveraging these causal insights, we outline principled prompting strategies that regulate these pathways through selection mechanisms. This framework not only unifies existing prompting-based debiasing techniques, but also opens up new directions for reducing bias by encouraging the model to prioritize fact-based reasoning over reliance on biased social cues. We validate our framework through extensive experiments on real-world datasets across multiple domains, demonstrating its effectiveness in debiasing LLM decisions, even with only black-box access to the model.
翻译:大型语言模型(LLMs)尽管具有卓越的能力,却容易生成带有偏见和歧视性的回应。随着LLMs日益影响高风险决策(例如招聘和医疗保健),缓解这些偏见变得至关重要。在本研究中,我们提出了一种因果引导的去偏框架来应对社会偏见,旨在减少LLMs决策与输入中社会信息之间不应存在的依赖性。该框架引入了一种新颖视角,以识别社会信息如何通过不同因果路径影响LLM的决策。利用这些因果洞见,我们提出了基于选择机制调节这些路径的原则性提示策略。该框架不仅统一了现有的基于提示的去偏技术,还通过鼓励模型优先考虑基于事实的推理而非依赖带有偏见的社会线索,为减少偏见开辟了新方向。我们在多个领域的真实数据集上进行了广泛实验,验证了该框架的有效性,证明其即使在仅能黑盒访问模型的情况下,也能有效消除LLM决策中的偏见。