Prediction models have been widely adopted as the basis for decision-making in domains as diverse as employment, education, lending, and health. Yet, few real world problems readily present themselves as precisely formulated prediction tasks. In particular, there are often many reasonable target variable options. Prior work has argued that this is an important and sometimes underappreciated choice, and has also shown that target choice can have a significant impact on the fairness of the resulting model. However, the existing literature does not offer a formal framework for characterizing the extent to which target choice matters in a particular task. Our work fills this gap by drawing connections between the problem of target choice and recent work on predictive multiplicity. Specifically, we introduce a conceptual and computational framework for assessing how the choice of target affects individuals' outcomes and selection rate disparities across groups. We call this multi-target multiplicity. Along the way, we refine the study of single-target multiplicity by introducing notions of multiplicity that respect resource constraints -- a feature of many real-world tasks that is not captured by existing notions of predictive multiplicity. We apply our methods on a healthcare dataset, and show that the level of multiplicity that stems from target variable choice can be greater than that stemming from nearly-optimal models of a single target.
翻译:预测模型已被广泛用作就业、教育、贷款和健康等多个领域决策的基础。然而,很少有现实世界的问题能直接呈现为精确定义的预测任务。特别地,通常存在许多合理的目标变量选项。先前的研究已指出,这是一个重要且有时被低估的选择,并表明目标选择会对最终模型的公平性产生显著影响。然而,现有文献并未提供用于表征目标选择在特定任务中重要程度的正式框架。我们的工作通过将目标选择问题与预测多样性方面的最新研究联系起来,填补了这一空白。具体而言,我们引入了一个概念性和计算性框架,用于评估目标选择如何影响个体结果以及跨群体的选择率差异。我们将此称为多目标多样性。在此过程中,我们通过引入尊重资源约束的多样性概念,完善了单目标多样性的研究——这一特性是许多现实世界任务的组成部分,但未被现有的预测多样性概念所涵盖。我们将所提出的方法应用于医疗数据集,并表明由目标变量选择产生的多样性水平可能大于由单个目标的近最优模型产生的多样性水平。