Automated systems built on artificial intelligence (AI) are increasingly deployed across high-stakes domains, raising critical concerns about fairness and the perpetuation of demographic disparities that exist in the world. In this context, causal inference provides a principled framework for reasoning about fairness, as it links observed disparities to underlying mechanisms and aligns naturally with human intuition and legal notions of discrimination. Prior work on causal fairness primarily focuses on the standard machine learning setting, where a decision-maker constructs a single predictive mechanism $f_{\widehat Y}$ for an outcome variable $Y$, while inheriting the causal mechanisms of all other covariates from the real world. The generative AI setting, however, is markedly more complex: generative models can sample from arbitrary conditionals over any set of variables, implicitly constructing their own beliefs about all causal mechanisms rather than learning a single predictive function. This fundamental difference requires new developments in causal fairness methodology. We formalize the problem of causal fairness in generative AI and unify it with the standard ML setting under a common theoretical framework. We then derive new causal decomposition results that enable granular quantification of fairness impacts along both (a) different causal pathways and (b) the replacement of real-world mechanisms by the generative model's mechanisms. We establish identification conditions and introduce efficient estimators for causal quantities of interest, and demonstrate the value of our methodology by analyzing race and gender bias in large language models across different datasets.
翻译:基于人工智能(AI)的自动化系统日益部署于高风险领域,引发了对其公平性以及世界现存人口差异持续扩大的严重关切。在此背景下,因果推断为公平性推理提供了原则性框架,因为它将观测到的差异与潜在机制联系起来,并自然契合人类直觉与法律层面的歧视概念。先前关于因果公平性的研究主要聚焦于标准机器学习场景:决策者构建一个针对结果变量 $Y$ 的单一预测机制 $f_{\widehat Y}$,同时继承现实世界中所有其他协变量的因果机制。然而,生成式人工智能的场景显著更为复杂:生成模型可以从任意变量集合的条件分布中进行采样,隐式地构建自身对所有因果机制的信念,而非学习单一的预测函数。这一根本差异要求因果公平性方法论做出新发展。我们正式定义了生成式人工智能中的因果公平性问题,并将其与标准机器学习场景统一在共同的理论框架下。随后,我们推导出新的因果分解结果,能够沿着(a)不同因果路径,以及(b)用生成模型机制替代现实世界机制的维度,对公平性影响进行细粒度量化。我们确立了识别条件,并引入了针对感兴趣因果量的高效估计量,通过分析不同数据集上大语言模型中的种族与性别偏差,证明了我们方法论的价值。