Even if deployed with the best intentions, machine learning methods can perpetuate, amplify or even create social biases. Measures of (un-)fairness have been proposed as a way to gauge the (non-)discriminatory nature of machine learning models. However, proxies of protected attributes causing discriminatory effects remain challenging to address. In this work, we propose a new algorithmic approach that measures group-wise demographic parity violations and allows us to inspect the causes of inter-group discrimination. Our method relies on the novel idea of measuring the dependence of a model on the protected attribute based on the explanation space, an informative space that allows for more sensitive audits than the primary space of input data or prediction distributions, and allowing for the assertion of theoretical demographic parity auditing guarantees. We provide a mathematical analysis, synthetic examples, and experimental evaluation of real-world data. We release an open-source Python package with methods, routines, and tutorials.
翻译:即使以最佳意图部署,机器学习方法仍可能延续、放大甚至创造社会偏见。衡量(不)公平性的指标已被提出作为评估机器学习模型是否存在(非)歧视性特征的手段。然而,导致歧视性影响的受保护属性代理变量仍是难以解决的问题。本文提出一种新的算法方法,用于量化群体间的人口统计平等违规程度,并使我们能够审视群体间歧视的成因。该方法基于一个创新理念:通过解释空间(一种比原始输入数据空间或预测分布空间更具敏感审计能力的信息空间)测量模型对受保护属性的依赖性,从而提供理论化的人口统计平等审计保证。我们提供了数学分析、合成示例及真实数据的实验评估。同时,我们发布了包含方法、流程和教程的开源Python工具包。