In recent years, there has been a flurry of research focusing on the fairness of machine learning models, and in particular on quantifying and eliminating bias against protected subgroups. One line of work generalizes the notion of protected subgroups beyond simple discrete classes by introducing the notion of a "rich subgroup", and seeks to train models that are calibrated or equalize error rates with respect to these richer subgroup classes. Largely orthogonally, local model explanation methods have been developed that given a classifier h and test point x, attribute influence for the prediction h(x) to the individual features of x. This raises a natural question: Do local model explanation methods attribute different feature importance values on average across different protected subgroups, and can we detect these disparities efficiently? If the model places high weight on a given feature in a specific protected subgroup, but not on the dataset overall (or vice versa), this could be a potential indicator of bias in the predictive model or the underlying data generating process, and is at the very least a useful diagnostic that signals the need for a domain expert to delve deeper. In this paper, we formally introduce the notion of feature importance disparity (FID) in the context of rich subgroups, design oracle-efficent algorithms to identify large FID subgroups, and conduct a thorough empirical analysis that establishes auditing for FID as an important method to investigate dataset bias. Our experiments show that across 4 datasets and 4 common feature importance methods our algorithms find (feature, subgroup) pairs that simultaneously: (i) have subgroup feature importance that is often an order of magnitude different than the importance on the dataset as a whole (ii) generalize out of sample, and (iii) yield interesting discussions about potential bias inherent in these datasets.
翻译:近年来,关于机器学习模型公平性的研究激增,特别是针对保护子群体偏差的量化与消除。一类研究通过引入“丰富子群”概念,将保护子群体从简单离散类别推广至更复杂的子群类别,并致力于训练在这些更丰富的子群类别上满足校准性或误差率均等化的模型。另一大致正交的研究方向聚焦于局部模型解释方法——给定分类器h和测试点x,该方法将预测结果h(x)的影响归因于x的各个特征。这自然引发了一个问题:不同保护子群体之间,局部模型解释方法赋予特征重要性的均值是否存在差异?我们能否有效检测这些差异?若模型对特定保护子群体的某一特征赋予高权重,但对整体数据集并非如此(反之亦然),这可能预示着预测模型或底层数据生成过程存在偏差,至少是一项有用的诊断信号,提示需要领域专家深入探究。本文在丰富子群背景下正式定义了特征重要性差异(FID)概念,设计了识别大规模FID子群的预言机高效算法,并通过详实的实证分析确立了将FID审计作为数据集偏差检测的重要方法。实验表明,在4个数据集和4种常用特征重要性方法中,我们的算法所发现的(特征,子群)对同时具备以下特性:(i)子群特征重要性常与整体数据集的重要性存在量级差异;(ii)具备样本外泛化能力;(iii)能引发关于这些数据集中潜在偏差的有趣讨论。