Fairness is steadily becoming a crucial requirement of Machine Learning (ML) systems. A particularly important notion is subgroup fairness, i.e., fairness in subgroups of individuals that are defined by more than one attributes. Identifying bias in subgroups can become both computationally challenging, as well as problematic with respect to comprehensibility and intuitiveness of the finding to end users. In this work we focus on the latter aspects; we propose an explainability method tailored to identifying potential bias in subgroups and visualizing the findings in a user friendly manner to end users. In particular, we extend the ALE plots explainability method, proposing FALE (Fairness aware Accumulated Local Effects) plots, a method for measuring the change in fairness for an affected population corresponding to different values of a feature (attribute). We envision FALE to function as an efficient, user friendly, comprehensible and reliable first-stage tool for identifying subgroups with potential bias issues.
翻译:公平性逐渐成为机器学习(ML)系统的一项关键要求。其中,子群体公平性——即由多个属性定义的个体子群体中的公平性——是一个尤为重要的概念。识别子群体中的偏差,既可能带来计算上的挑战,也可能导致其结论对于最终用户而言缺乏可理解性和直观性。本文聚焦于后者:我们提出了一种专门用于识别子群体中潜在偏差的可解释性方法,并以用户友好的方式向最终用户呈现结果。具体而言,我们扩展了ALE图可解释性方法,提出了FALE(公平性感知的累积局部效应)图,该方法用于衡量受影响的群体中,对应特征(属性)不同取值时的公平性变化。我们期望FALE成为一种高效、用户友好、可理解且可靠的一阶段工具,用于识别存在潜在偏差问题的子群体。