The increasing application of Artificial Intelligence and Machine Learning models poses potential risks of unfair behavior and, in light of recent regulations, has attracted the attention of the research community. Several researchers focused on seeking new fairness definitions or developing approaches to identify biased predictions. However, none try to exploit the counterfactual space to this aim. In that direction, the methodology proposed in this work aims to unveil unfair model behaviors using counterfactual reasoning in the case of fairness under unawareness setting. A counterfactual version of equal opportunity named counterfactual fair opportunity is defined and two novel metrics that analyze the sensitive information of counterfactual samples are introduced. Experimental results on three different datasets show the efficacy of our methodologies and our metrics, disclosing the unfair behavior of classic machine learning and debiasing models.
翻译:人工智能与机器学习模型的广泛应用带来了不公平行为的潜在风险,且鉴于近期法规的出台,这一问题已引起研究界的关注。多位研究者致力于寻求新的公平性定义或开发识别有偏预测的方法,然而,尚无研究尝试利用反事实空间实现该目标。为此,本文提出的方法论旨在通过反事实推理揭示公平性无知情境下的模型不公平行为。我们定义了反事实版本的平等机会,即反事实公平机会,并引入了两种基于反事实样本敏感信息分析的新型度量指标。在三个不同数据集上的实验结果表明,我们的方法与度量指标具有有效性,揭示了经典机器学习模型与去偏模型的不公平行为。