Algorithmic Fairness and the explainability of potentially unfair outcomes are crucial for establishing trust and accountability of Artificial Intelligence systems in domains such as healthcare and policing. Though significant advances have been made in each of the fields separately, achieving explainability in fairness applications remains challenging, particularly so in domains where deep neural networks are used. At the same time, ethical data-mining has become ever more relevant, as it has been shown countless times that fairness-unaware algorithms result in biased outcomes. Current approaches focus on mitigating biases in the outcomes of the model, but few attempts have been made to try to explain \emph{why} a model is biased. To bridge this gap, we propose a comprehensive approach that leverages optimal transport theory to uncover the causes and implications of biased regions in images, which easily extends to tabular data as well. Through the use of Wasserstein barycenters, we obtain scores that are independent of a sensitive variable but keep their marginal orderings. This step ensures predictive accuracy but also helps us to recover the regions most associated with the generation of the biases. Our findings hold significant implications for the development of trustworthy and unbiased AI systems, fostering transparency, accountability, and fairness in critical decision-making scenarios across diverse domains.
翻译:算法公平性与潜在不公平结果的可解释性,对于在医疗健康、治安等领域建立人工智能系统的信任度与问责机制至关重要。尽管这两个领域各自取得了显著进展,但要在公平性应用中实现可解释性仍充满挑战,尤其在深度神经网络应用场景中更为突出。与此同时,道德数据挖掘已变得空前重要——无数案例表明,未考虑公平性的算法会产生有偏结果。当前研究方法主要聚焦于缓解模型输出中的偏差,但鲜有尝试解释模型产生偏差的根本原因。为弥合这一研究空白,我们提出了一种综合方法,利用最优输运理论揭示图像中偏差区域的成因及其影响,该方法亦可便捷地推广至表格数据。通过沃瑟斯坦重心,我们获得了独立于敏感变量但仍保持边际排序的评分。这一步骤不仅保障了预测精度,更有助于复原与偏差产生最相关的区域。我们的发现对开发可信赖、无偏见的AI系统具有重要启示,有助于在跨领域关键决策场景中促进透明度、问责制与公平性。