In high-stake domains such as healthcare and hiring, the role of machine learning (ML) in decision-making raises significant fairness concerns. This work focuses on Counterfactual Fairness (CF), which posits that an ML model's outcome on any individual should remain unchanged if they had belonged to a different demographic group. Previous works have proposed methods that guarantee CF. Notwithstanding, their effects on the model's predictive performance remains largely unclear. To fill in this gap, we provide a theoretical study on the inherent trade-off between CF and predictive performance in a model-agnostic manner. We first propose a simple but effective method to cast an optimal but potentially unfair predictor into a fair one without losing the optimality. By analyzing its excess risk in order to achieve CF, we quantify this inherent trade-off. Further analysis on our method's performance with access to only incomplete causal knowledge is also conducted. Built upon it, we propose a performant algorithm that can be applied in such scenarios. Experiments on both synthetic and semi-synthetic datasets demonstrate the validity of our analysis and methods.
翻译:在医疗和招聘等高风险领域,机器学习(ML)在决策中的作用引发了严重的公平性担忧。本研究聚焦于反事实公平性(CF),该准则主张:若个体属于不同人口统计群体,机器学习模型对其输出的结果应保持不变。先前研究已提出多种保证CF的方法,然而这些方法对模型预测性能的影响在很大程度上仍不明确。为填补这一空白,我们以模型无关的方式对CF与预测性能之间的固有权衡进行了理论研究。我们首先提出一种简单而有效的方法,将最优但可能不公平的预测器转化为公平预测器,同时保持其最优性。通过分析该方法为实现CF所产生的超额风险,我们量化了这一固有权衡。此外,我们还分析了在仅掌握不完整因果知识的情况下该方法的性能表现,并基于此提出一种适用于此类场景的高性能算法。在合成与半合成数据集上的实验验证了我们分析及方法的有效性。