Ensuring trust and accountability in Artificial Intelligence systems demands explainability of its outcomes. Despite significant progress in Explainable AI, human biases still taint a substantial portion of its training data, raising concerns about unfairness or discriminatory tendencies. Current approaches in the field of Algorithmic Fairness focus on mitigating such biases in the outcomes of a model, but few attempts have been made to try to explain \emph{why} a model is biased. To bridge this gap between the two fields, we propose a comprehensive approach that uses optimal transport theory to uncover the causes of discrimination in Machine Learning applications, with a particular emphasis on image classification. We leverage Wasserstein barycenters to achieve fair predictions and introduce an extension to pinpoint bias-associated regions. This allows us to derive a cohesive system which uses the enforced fairness to measure each features influence \emph{on} the bias. Taking advantage of this interplay of enforcing and explaining fairness, our method hold significant implications for the development of trustworthy and unbiased AI systems, fostering transparency, accountability, and fairness in critical decision-making scenarios across diverse domains.
翻译:确保人工智能系统的可信赖与可问责性,需要对其输出结果进行可解释性分析。尽管可解释人工智能领域已取得显著进展,但人类偏见仍侵蚀着大量训练数据,引发对不公平或歧视性倾向的担忧。当前算法公平性领域主要聚焦于缓解模型输出中的偏见,但鲜有研究试图解释模型产生偏见的根本原因。为弥合这两个领域之间的鸿沟,我们提出一种综合方法,利用最优传输理论揭示机器学习应用中歧视的成因,重点聚焦图像分类场景。通过采用Wasserstein重心实现公平预测,并引入扩展方法精确定位与偏见相关的区域。由此推导出连贯的系统框架,借助强制公平性度量每个特征对偏见的影响程度。利用这种强制公平性与解释公平性之间的相互作用机制,本方法为开发可信赖且无偏的人工智能系统提供了重要启示,有助于在跨领域关键决策场景中促进透明度、可问责性与公平性。