Fairness in machine learning has attained significant focus due to the widespread application in high-stake decision-making tasks. Unregulated machine learning classifiers can exhibit bias towards certain demographic groups in data, thus the quantification and mitigation of classifier bias is a central concern in fairness in machine learning. In this paper, we aim to quantify the influence of different features in a dataset on the bias of a classifier. To do this, we introduce the Fairness Influence Function (FIF). This function breaks down bias into its components among individual features and the intersection of multiple features. The key idea is to represent existing group fairness metrics as the difference of the scaled conditional variances in the classifier's prediction and apply a decomposition of variance according to global sensitivity analysis. To estimate FIFs, we instantiate an algorithm FairXplainer that applies variance decomposition of classifier's prediction following local regression. Experiments demonstrate that FairXplainer captures FIFs of individual feature and intersectional features, provides a better approximation of bias based on FIFs, demonstrates higher correlation of FIFs with fairness interventions, and detects changes in bias due to fairness affirmative/punitive actions in the classifier. The code is available at https://github.com/ReAILe/bias-explainer.
翻译:机器学习中的公平性因在高风险决策任务中的广泛应用而受到高度关注。未经约束的机器学习分类器可能对数据中的某些人口群体表现出偏见,因此分类器偏差的量化与缓解是机器学习公平性的核心问题。本文旨在量化数据集中不同特征对分类器偏差的影响。为此,我们引入公平性影响函数(FIF)。该函数将偏差分解为单个特征以及多特征交互作用中的分量。其核心思想是将现有群体公平性指标表示为分类器预测中缩放条件方差的差值,并依据全局敏感性分析进行方差分解。为估计FIF,我们实例化了一个FairXplainer算法,该算法通过局部回归对分类器预测进行方差分解。实验表明,FairXplainer能够捕获单个特征与交互特征的FIF,基于FIF实现更好的偏差近似,展示FIF与公平性干预措施之间的更高相关性,并检测分类器因公平性平权/惩罚措施导致的偏差变化。代码已公开于https://github.com/ReAILe/bias-explainer。