It is widely held that one cause of downstream bias in classifiers is bias present in the training data. Rectifying such biases may involve context-dependent interventions such as training separate models on subgroups, removing features with bias in the collection process, or even conducting real-world experiments to ascertain sources of bias. Despite the need for such data bias investigations, few automated methods exist to assist practitioners in these efforts. In this paper, we present one such method that given a dataset $X$ consisting of protected and unprotected features, outcomes $y$, and a regressor $h$ that predicts $y$ given $X$, outputs a tuple $(f_j, g)$, with the following property: $g$ corresponds to a subset of the training dataset $(X, y)$, such that the $j^{th}$ feature $f_j$ has much larger (or smaller) influence in the subgroup $g$, than on the dataset overall, which we call feature importance disparity (FID). We show across $4$ datasets and $4$ common feature importance methods of broad interest to the machine learning community that we can efficiently find subgroups with large FID values even over exponentially large subgroup classes and in practice these groups correspond to subgroups with potentially serious bias issues as measured by standard fairness metrics.
翻译:普遍认为,分类器下游偏差的一个成因是训练数据中存在的偏差。纠正此类偏差可能涉及依赖于具体语境的干预措施,例如在子群体上训练独立模型、移除采集过程中存在偏差的特征,甚至开展真实世界实验以确定偏差来源。尽管需要开展此类数据偏差调查,但极少有自动化方法能够辅助从业者完成相关工作。本文提出一种方法:给定包含受保护特征与非受保护特征的数据集$X$、结果变量$y$,以及基于$X$预测$y$的回归器$h$,该方法输出一个元组$(f_j, g)$,其满足以下性质:$g$对应训练数据集$(X, y)$的一个子集,且在此子集$g$中第$j$个特征$f_j$的影响力相较于整个数据集显著增大(或减小),我们将此现象称为特征重要性差异(FID)。我们在4个数据集和4种机器学习领域广泛关注的常见特征重要性方法上证明,即使面对指数级庞大的子类空间,也能高效找到具有较大FID值的子群。实际应用中,这些子群对应着可能具有严重偏差问题的子群体,且此偏差可通过标准公平性指标加以度量。