One Model Many Scores: Using Multiverse Analysis to Prevent Fairness Hacking and Evaluate the Influence of Model Design Decisions

A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. The downstream effects of ADM systems critically depend on the decisions made during a systems' design, implementation, and evaluation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these decisions are made implicitly, without knowing exactly how they will influence the final system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit decisions during design and evaluation into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can investigate the variability and robustness of fairness scores and see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand fairness implications of design and evaluation decisions using an exemplary case study of predicting public health care coverage for vulnerable populations. Our results highlight how decisions regarding the evaluation of a system can lead to vastly different fairness metrics for the same model. This is problematic, as a nefarious actor could optimise or "hack" a fairness metric to portray a discriminating model as fair merely by changing how it is evaluated. We illustrate how a multiverse analysis can help to address this issue.

翻译：全球范围内大量系统采用算法决策（ADM）来（部分）自动化此前由人类做出的决策。ADM系统的下游效应关键取决于系统设计、实施和评估过程中的决策，因为数据中的偏差可能在建模流程中被减轻或强化。许多此类决策隐含做出，决策者并不确切知晓它们将如何影响最终系统。为研究这一问题，我们借鉴心理学领域的见解，将多宇宙分析方法引入算法公平性领域。在我们提出的方法中，我们将设计与评估过程中的隐含决策转化为显性决策，并展示其公平性影响。通过组合不同决策，我们构建了一个包含所有可能决策组合的"多宇宙"网格。针对每个宇宙，我们计算公平性与性能指标。利用生成的数据集，研究者可以探究公平性分数的变异性与稳健性，观察各项决策如何影响公平性以及哪些决策起关键作用。我们以预测弱势群体公共卫生保健覆盖率的案例研究为例，展示了如何利用多宇宙分析更深入地理解设计与评估决策的公平性影响。研究结果揭示了系统评估决策如何导致同一模型产生截然不同的公平性指标。这一问题尤为严重，因为恶意行为者可能通过仅改变评估方式，对公平性指标进行优化或"篡改"，从而将歧视性模型粉饰为公平模型。我们阐释了多宇宙分析如何帮助应对这一挑战。