Model Explanation Disparities as a Fairness Diagnostic

In recent years, there has been a flurry of research focusing on the fairness of machine learning models, and in particular on quantifying and eliminating bias against protected subgroups. One line of work generalizes the notion of protected subgroups beyond simple discrete classes by introducing the notion of a "rich subgroup", and seeks to train models that are calibrated or equalize error rates with respect to these richer subgroup classes. Largely orthogonally, local model explanation methods have been developed that given a classifier h and test point x, attribute influence for the prediction h(x) to the individual features of x. This raises a natural question: Do local model explanation methods attribute different feature importance values on average across different protected subgroups, and can we detect these disparities efficiently? If the model places high weight on a given feature in a specific protected subgroup, but not on the dataset overall (or vice versa), this could be a potential indicator of bias in the predictive model or the underlying data generating process, and is at the very least a useful diagnostic that signals the need for a domain expert to delve deeper. In this paper, we formally introduce the notion of feature importance disparity (FID) in the context of rich subgroups, design oracle-efficent algorithms to identify large FID subgroups, and conduct a thorough empirical analysis that establishes auditing for FID as an important method to investigate dataset bias. Our experiments show that across 4 datasets and 4 common feature importance methods our algorithms find (feature, subgroup) pairs that simultaneously: (i) have subgroup feature importance that is often an order of magnitude different than the importance on the dataset as a whole (ii) generalize out of sample, and (iii) yield interesting discussions about potential bias inherent in these datasets.

翻译：近年来，关于机器学习模型公平性的研究激增，特别是针对保护子群体偏差的量化与消除。一类研究通过引入“丰富子群”概念，将保护子群体从简单离散类别推广至更复杂的子群类别，并致力于训练在这些更丰富的子群类别上满足校准性或误差率均等化的模型。另一大致正交的研究方向聚焦于局部模型解释方法——给定分类器h和测试点x，该方法将预测结果h(x)的影响归因于x的各个特征。这自然引发了一个问题：不同保护子群体之间，局部模型解释方法赋予特征重要性的均值是否存在差异？我们能否有效检测这些差异？若模型对特定保护子群体的某一特征赋予高权重，但对整体数据集并非如此（反之亦然），这可能预示着预测模型或底层数据生成过程存在偏差，至少是一项有用的诊断信号，提示需要领域专家深入探究。本文在丰富子群背景下正式定义了特征重要性差异（FID）概念，设计了识别大规模FID子群的预言机高效算法，并通过详实的实证分析确立了将FID审计作为数据集偏差检测的重要方法。实验表明，在4个数据集和4种常用特征重要性方法中，我们的算法所发现的（特征，子群）对同时具备以下特性：（i）子群特征重要性常与整体数据集的重要性存在量级差异；（ii）具备样本外泛化能力；（iii）能引发关于这些数据集中潜在偏差的有趣讨论。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ICDM 2022教程】图挖掘中的公平性:度量、算法和应用

专知会员服务

28+阅读 · 2022年12月26日

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日