We present counterfactual situation testing (CST), a causal data mining framework for detecting discrimination in classifiers. CST aims to answer in an actionable and meaningful way the intuitive question "what would have been the model outcome had the individual, or complainant, been of a different protected status?" It extends the legally-grounded situation testing of Thanh et al. (2011) by operationalizing the notion of fairness given the difference using counterfactual reasoning. For any complainant, we find and compare similar protected and non-protected instances in the dataset used by the classifier to construct a control and test group, where a difference between the decision outcomes of the two groups implies potential individual discrimination. Unlike situation testing, which builds both groups around the complainant, we build the test group on the complainant's counterfactual generated using causal knowledge. The counterfactual is intended to reflect how the protected attribute when changed affects the seemingly neutral attributes used by the classifier, which is taken for granted in many frameworks for discrimination. Under CST, we compare similar individuals within each group but dissimilar individuals across both groups due to the possible difference between the complainant and its counterfactual. Evaluating our framework on two classification scenarios, we show that it uncovers a greater number of cases than situation testing, even when the classifier satisfies the counterfactual fairness condition of Kusner et al. (2017).
翻译:我们提出了反事实情境测试(CST),这是一种用于检测分类器中歧视的因果数据挖掘框架。CST旨在以可操作且有意义的方式回答直观问题:“若个体(或投诉人)处于不同的受保护身份,模型结果将会如何?”它通过运用反事实推理将“差异条件下的公平性”概念操作化,从而扩展了Thanh等人(2011年)基于法律的情境测试方法。对于任何投诉人,我们在分类器使用的数据集中寻找并比较相似的保护状态个体与非保护状态个体,构建对照组和测试组,两组决策结果的差异即暗示潜在的个体歧视。与情境测试围绕投诉人构建两组的方式不同,我们利用因果知识生成投诉人的反事实,并以此构建测试组。反事实旨在反映受保护属性变化如何影响分类器使用的看似中立的属性——这一机制在许多歧视分析框架中被视为理所当然。在CST框架下,我们比较组内相似个体,但由于投诉人与其反事实之间可能存在差异,两组间的个体则互不相似。通过在两个分类场景中评估该框架,我们发现即使分类器满足Kusner等人(2017年)提出的反事实公平条件,CST仍能比传统情境测试发现更多歧视案例。