MESD: A Risk-Sensitive Metric for Explanation Fairness Across Intersectional Subgroups

Fairness in machine learning is predominantly evaluated through outcome-oriented metrics, such as Demographic parity, which measure whether predictions are statistically consistent across protected groups. However, these metrics cannot detect whether a model uses systematically different reasoning for different demographic groups, which violates procedural fairness principles. This problem is compounded by intersectionality, where models may appear fair on individual attributes (e.g., race) while exhibiting significant disparities for intersectional subgroups (e.g., race $\times$ gender), a phenomenon known as fairness gerrymandering. In this work, we introduce Multi-category Explanation Stability Disparity (MESD), a procedural fairness metric that quantifies disparities in explanation quality across intersectional subgroups formed by the Cartesian product of multiple protected attributes. MESD integrates three components, which are label-aware aggregation aligned with outcome-conditional fairness, empirical-Bayes shrinkage to stabilize estimates for small intersectional groups, and Conditional Value-at-Risk (CVaR) weighting to emphasize worst-case subgroup disparities. We integrate MESD within a multi-objective optimization framework (UEF) that jointly optimizes utility, outcome fairness, and procedural fairness using NSGA-II. We evaluated MESD and UEF on three benchmark datasets along with four state-of-the-art methods in several experiments, and we demonstrate that MESD reveals procedural disparities invisible to outcome metrics alone. We position our contribution within procedural justice theory and discuss implications for regulatory compliance and intersectional equity.

翻译：机器学习中的公平性主要通过结果导向的度量标准来评估，例如人口均等性，其衡量预测结果在受保护群体间是否具有统计一致性。然而，此类指标无法检测模型是否对不同人口群体使用系统性不同的推理逻辑，这违反了程序公平性原则。交叉性进一步加剧了该问题：模型在单一属性（如种族）上可能看似公平，但对交叉子群（如种族×性别）却表现出显著差异，这种现象被称为公平性格里曼德分割。本文提出多类别解释稳定性差异（MESD），一种程序公平性度量指标，通过量化多个受保护属性笛卡尔积形成的交叉子群间解释质量的差异。MESD整合了三个组件：与结果条件公平性对齐的标签感知聚合、用于稳定小群体估计的经验贝叶斯收缩，以及强调最差子群差异的条件风险价值（CVaR）加权。我们进一步将MESD嵌入多目标优化框架（UEF），该框架利用NSGA-II联合优化效用、结果公平性与程序公平性。通过在三类基准数据集上结合四种前沿方法设计多组实验，我们验证了MESD能够揭示仅靠结果指标无法捕获的程序性差异。本文从程序正义理论出发定位研究贡献，并讨论其对监管合规与交叉群体平等的启示。