Machine learning algorithms that aid human decision-making may inadvertently discriminate against certain protected groups. We formalize direct discrimination as a direct causal effect of the protected attributes on the decisions, while induced discrimination as a change in the causal influence of non-protected features associated with the protected attributes. The measurements of marginal direct effect (MDE) and SHapley Additive exPlanations (SHAP) reveal that state-of-the-art fair learning methods can induce discrimination via association or reverse discrimination in synthetic and real-world datasets. To inhibit discrimination in algorithmic systems, we propose to nullify the influence of the protected attribute on the output of the system, while preserving the influence of remaining features. We introduce and study post-processing methods achieving such objectives, finding that they yield relatively high model accuracy, prevent direct discrimination, and diminishes various disparity measures, e.g., demographic disparity.
翻译:辅助人类决策的机器学习算法可能会无意中歧视某些受保护群体。我们将直接歧视形式化为受保护属性对决策的直接因果效应,而将间接歧视定义为与受保护属性相关联的非受保护特征因果影响的改变。边际直接效应(MDE)和SHapley加性解释(SHAP)的测量表明,在合成数据集和真实数据集中,最先进的公平学习方法可能通过关联性歧视或反向歧视诱发歧视。为抑制算法系统中的歧视,我们提出在保持其余特征影响的同时,消除受保护属性对系统输出的影响。我们引入并研究了实现该目标的后处理方法,发现这些方法既能保持较高的模型准确率,又能防止直接歧视,并减少各类差异度量指标(如人口统计差异)。