When using machine learning (ML) to aid decision-making, it is critical to ensure that an algorithmic decision is fair, i.e., it does not discriminate against specific individuals/groups, particularly those from underprivileged populations. Existing group fairness methods require equal group-wise measures, which however fails to consider systematic between-group differences. The confounding factors, which are non-sensitive variables but manifest systematic differences, can significantly affect fairness evaluation. To mitigate this problem, we believe that a fairness measurement should be based on the comparison between counterparts (i.e., individuals who are similar to each other with respect to the task of interest) from different groups, whose group identities cannot be distinguished algorithmically by exploring confounding factors. We have developed a propensity-score-based method for identifying counterparts, which prevents fairness evaluation from comparing "oranges" with "apples". In addition, we propose a counterpart-based statistical fairness index, termed Counterpart-Fairness (CFair), to assess fairness of ML models. Empirical studies on the Medical Information Mart for Intensive Care (MIMIC)-IV database were conducted to validate the effectiveness of CFair. We publish our code at \url{https://github.com/zhengyjo/CFair}.
翻译:[translated abstract in Chinese]
当使用机器学习辅助决策时,确保算法决策的公平性至关重要,即算法不应对特定个体/群体(尤其是弱势群体)产生歧视。现有的群体公平性方法要求群体间度量指标相等,但这未能考虑系统性群体间差异。混杂因素作为非敏感变量却表现出系统性差异,会显著影响公平性评估。为解决这一问题,我们认为公平性度量应基于不同群体间"对等个体"(即在目标任务中彼此相似的个体)的比较——这些个体的群体身份无法通过探索混杂因素在算法上加以区分。我们开发了一种基于倾向性得分的对等个体识别方法,避免公平性评估出现"张冠李戴"式的比较。此外,我们提出一个基于对等个体的统计公平性指标——对等公平性(Counterpart-Fairness, CFair),用于评估机器学习模型的公平性。在重症监护医疗信息集(MIMIC)-IV数据库上的实证研究验证了CFair的有效性。我们的代码开源发布于:\url{https://github.com/zhengyjo/CFair}。