Explainable Artificial Intelligence (XAI) is becoming increasingly essential for enhancing the transparency of machine learning (ML) models. Among the various XAI techniques, counterfactual explanations (CFs) hold a pivotal role due to their ability to illustrate how changes in input features can alter an ML model's decision, thereby offering actionable recourse to users. Ensuring that individuals with comparable attributes and those belonging to different protected groups (e.g., demographic) receive similar and actionable recourse options is essential for trustworthy and fair decision-making. In this work, we address this challenge directly by focusing on the generation of fair CFs. Specifically, we start by defining and formulating fairness at: 1) individual fairness, ensuring that similar individuals receive similar CFs, 2) group fairness, ensuring equitable CFs across different protected groups and 3) hybrid fairness, which accounts for both individual and broader group-level fairness. We formulate the problem as an optimization task and propose a novel model-agnostic, reinforcement learning based approach to generate CFs that satisfy fairness constraints at both the individual and group levels, two objectives that are usually treated as orthogonal. As fairness metrics, we extend existing metrics commonly used for auditing ML models, such as equal choice of recourse and equal effectiveness across individuals and groups. We evaluate our approach on three benchmark datasets, showing that it effectively ensures individual and group fairness while preserving the quality of the generated CFs in terms of proximity and plausibility, and quantify the cost of fairness in the different levels separately. Our work opens a broader discussion on hybrid fairness and its role and implications for XAI and beyond CFs.
翻译:可解释人工智能(XAI)对于提升机器学习(ML)模型的透明度日益重要。在众多XAI技术中,反事实解释(CFs)因其能够阐明输入特征的变化如何改变ML模型的决策,从而为用户提供可操作的救济方案,具有关键作用。确保具有相似属性的个体以及属于不同受保护群体(如人口统计学群体)的用户获得相似且可操作的救济选项,对于实现可信赖且公平的决策至关重要。在本工作中,我们通过聚焦于公平CFs的生成来直接应对这一挑战。具体而言,我们首先从以下三个层面定义并形式化公平性:1)个体公平性,确保相似个体获得相似的CFs;2)群体公平性,确保不同受保护群体间获得公平的CFs;3)混合公平性,兼顾个体与更广泛的群体层面公平性。我们将该问题形式化为一个优化任务,并提出一种新颖的、与模型无关的、基于强化学习的方法来生成同时满足个体与群体层面公平性约束的CFs——这两个目标通常被视为正交的。作为公平性度量指标,我们扩展了常用于审计ML模型的现有指标,例如个体与群体间的平等救济选择权和平等有效性。我们在三个基准数据集上评估了我们的方法,结果表明,该方法在保持生成CFs在邻近性与合理性方面质量的同时,能有效确保个体与群体公平性,并分别量化了不同层面公平性的代价。我们的工作开启了关于混合公平性及其在XAI乃至CFs之外的作用与影响的更广泛讨论。