One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems' design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or "hack" a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

翻译：全球众多系统采用算法决策（ADM）来（部分）替代传统的人工决策。ADM系统的下游效应关键取决于系统设计、实施与评估阶段的决策，因为数据偏差可能在建模流程中被缓解或放大。这些决策大多在未明确知晓其对最终系统影响的情况下被隐性制定。为研究该问题，我们借鉴心理学领域见解，提出算法公平性的多元宇宙分析方法。该方法将设计与评估过程中的隐性决策转化为显性决策，并揭示其对公平性的影响。通过组合不同决策，我们构建了所有可能决策组合的"宇宙"网格。针对每个宇宙，我们计算公平性与性能指标。利用所得数据集，研究者可探究公平性评分的变异性和稳健性，并识别具体决策如何影响公平性。我们通过预测弱势群体公共医保覆盖率的典型案例，展示多元宇宙分析如何深化对设计与评估决策之公平性影响的理解。研究结果表明，针对同一模型，不同的系统评估决策可能导致截然不同的公平性指标。这带来了严重隐患：恶意行为者仅需改变评估方式即可优化或"操纵"公平性指标，使存在歧视的模型呈现公平假象。我们进一步阐释多元宇宙分析如何助力解决这一问题。