In this study, we introduce the application of causal disparity analysis to unveil intricate relationships and causal pathways between sensitive attributes and the targeted outcomes within real-world observational data. Our methodology involves employing causal decomposition analysis to quantify and examine the causal interplay between sensitive attributes and outcomes. We also emphasize the significance of integrating heterogeneity assessment in causal disparity analysis to gain deeper insights into the impact of sensitive attributes within specific sub-groups on outcomes. Our two-step investigation focuses on datasets where race serves as the sensitive attribute. The results on two datasets indicate the benefit of leveraging causal analysis and heterogeneity assessment not only for quantifying biases in the data but also for disentangling their influences on outcomes. We demonstrate that the sub-groups identified by our approach to be affected the most by disparities are the ones with the largest ML classification errors. We also show that grouping the data only based on a sensitive attribute is not enough, and through these analyses, we can find sub-groups that are directly affected by disparities. We hope that our findings will encourage the adoption of such methodologies in future ethical AI practices and bias audits, fostering a more equitable and fair technological landscape.
翻译:本研究介绍了因果差异分析在揭示现实世界观测数据中敏感属性与目标结果之间复杂关系及因果路径方面的应用。我们的方法采用因果分解分析来量化并检验敏感属性与结果之间的因果相互作用。同时,我们强调在因果差异分析中整合异质性评估的重要性,以更深入地理解特定子群内敏感属性对结果的影响。我们的两步研究聚焦于以种族作为敏感属性的数据集。在两个数据集上的结果表明,利用因果分析和异质性评估不仅有助于量化数据中的偏差,还能解析这些偏差对结果的影响。我们证明,通过本方法识别出的受差异影响最大的子群,正是机器学习分类错误率最高的群体。我们还表明,仅基于敏感属性对数据进行分组是不够的;通过这些分析,我们能够发现直接受差异影响的子群。我们希望本研究结果能促进此类方法在未来伦理人工智能实践和偏差审计中的应用,从而构建更加公平、公正的技术生态。