We present counterfactual situation testing (CST), a causal data mining framework for detecting discrimination in classifiers. CST aims to answer in an actionable and meaningful way the intuitive question "what would have been the model outcome had the individual, or complainant, been of a different protected status?" It extends the legally-grounded situation testing of Thanh et al. (2011) by operationalizing the notion of fairness given the difference using counterfactual reasoning. For any complainant, we find and compare similar protected and non-protected instances in the dataset used by the classifier to construct a control and test group, where a difference between the decision outcomes of the two groups implies potential individual discrimination. Unlike situation testing, which builds both groups around the complainant, we build the test group on the complainant's counterfactual generated using causal knowledge. The counterfactual is intended to reflect how the protected attribute when changed affects the seemingly neutral attributes used by the classifier, which is taken for granted in many frameworks for discrimination. Under CST, we compare similar individuals within each group but dissimilar individuals across both groups due to the possible difference between the complainant and its counterfactual. Evaluating our framework on two classification scenarios, we show that it uncovers a greater number of cases than situation testing, even when the classifier satisfies the counterfactual fairness condition of Kusner et al. (2017).
翻译:摘要:我们提出反事实情境测试(CST),这是一种用于检测分类器歧视的因果数据挖掘框架。CST旨在以可操作且有意义的方式回答一个直觉性问题:“如果个体(或投诉人)具有不同的受保护状态,模型结果会如何?”它通过运用反事实推理将“考虑差异条件下的公平性”概念操作化,从而扩展了Thanh等人(2011)基于法律的情境测试方法。针对任何投诉人,我们在分类器使用的数据集中寻找并比较相似的保护与非保护实例,以构建对照组和测试组——两组决策结果的差异即暗示潜在的个体歧视。与情境测试依赖投诉人构建两组不同,我们基于因果知识生成的投诉人反事实来构建测试组。该反事实旨在反映受保护属性发生变化时如何影响分类器所使用的看似中立的属性——这一机制在许多歧视框架中被视为理所当然。在CST框架下,我们比较组内相似个体,但由于投诉人与其反事实之间的潜在差异,不同组之间的个体不具相似性。通过对两个分类场景的评估,我们证明:即使分类器满足Kusner等人(2017)的反事实公平性条件,CST仍能比情境测试发现更多歧视案例。