As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual features influence model outcomes is crucial for building interpretable and equitable models. While feature importance metrics for accuracy are well-established, methods for assessing feature contributions to fairness remain underexplored. We propose two model-agnostic approaches to measure fair feature importance. First, we propose to compare model fairness before and after permuting feature values. This simple intervention-based approach decouples a feature and model predictions to measure its contribution to training. Second, we evaluate the fairness of models trained with and without a given feature. This occlusion-based score enjoys dramatic computational simplification via minipatch learning. Our empirical results reflect the simplicity and effectiveness of our proposed metrics for multiple predictive tasks. Both methods offer simple, scalable, and interpretable solutions to quantify the influence of features on fairness, providing new tools for responsible machine learning development.
翻译:随着机器学习模型对社会的影响日益加深,其不透明的特性对信任与问责制提出了挑战,尤其在公平性语境下。理解个体特征如何影响模型输出对于构建可解释且公平的模型至关重要。尽管针对准确率的特征重要性度量方法已较为成熟,但评估特征对公平性贡献的方法仍待深入探索。本文提出两种模型无关的方法来度量公平特征重要性。首先,我们提出通过比较特征值置换前后模型的公平性来评估特征影响。这种基于简单干预的方法将特征与模型预测解耦,以度量其对训练过程的贡献。其次,我们评估包含与不包含给定特征时训练所得模型的公平性。这种基于遮蔽的评分方法借助微块学习实现了显著的计算简化。实证结果体现了我们提出的度量方法在多种预测任务中的简洁性与有效性。两种方法均提供了简单、可扩展且可解释的解决方案,用于量化特征对公平性的影响,为负责任的机器学习开发提供了新工具。