Causal Bias Detection in Generative Artificial Intelligence

Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.

翻译：基于人工智能（AI）的自动化系统日益部署于高风险领域，引发了对其公平性以及世界现存人口差异持续扩大的严重关切。在此背景下，因果推断为公平性推理提供了原则性框架，因为它将观测到的差异与潜在机制联系起来，并自然契合人类直觉与法律层面的歧视概念。先前关于因果公平性的研究主要聚焦于标准机器学习场景：决策者构建一个针对结果变量 $Y$ 的单一预测机制 $f_{\widehat Y}$，同时继承现实世界中所有其他协变量的因果机制。然而，生成式人工智能的场景显著更为复杂：生成模型可以从任意变量集合的条件分布中进行采样，隐式地构建自身对所有因果机制的信念，而非学习单一的预测函数。这一根本差异要求因果公平性方法论做出新发展。我们正式定义了生成式人工智能中的因果公平性问题，并将其与标准机器学习场景统一在共同的理论框架下。随后，我们推导出新的因果分解结果，能够沿着（a）不同因果路径，以及（b）用生成模型机制替代现实世界机制的维度，对公平性影响进行细粒度量化。我们确立了识别条件，并引入了针对感兴趣因果量的高效估计量，通过分析不同数据集上大语言模型中的种族与性别偏差，证明了我们方法论的价值。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【新书】因果人工智能，576页pdf

专知会员服务

92+阅读 · 2025年4月10日

基于因果推断的推荐系统去偏研究

专知会员服务

21+阅读 · 2024年11月10日

【NTU博士论文】机器学习泛化性因果视角，200页pdf

专知会员服务

38+阅读 · 2023年12月25日