Decision support systems based on prediction sets help humans solve multiclass classification tasks by narrowing down the set of potential label values to a subset of them, namely a prediction set, and asking them to always predict label values from the prediction sets. While this type of systems have been proven to be effective at improving the average accuracy of the predictions made by humans, by restricting human agency, they may cause harm$\unicode{x2014}$a human who has succeeded at predicting the ground-truth label of an instance on their own may have failed had they used these systems. In this paper, our goal is to control how frequently a decision support system based on prediction sets may cause harm, by design. To this end, we start by characterizing the above notion of harm using the theoretical framework of structural causal models. Then, we show that, under a natural, albeit unverifiable, monotonicity assumption, we can estimate how frequently a system may cause harm using only predictions made by humans on their own. Further, we also show that, under a weaker monotonicity assumption, which can be verified experimentally, we can bound how frequently a system may cause harm again using only predictions made by humans on their own. Building upon these assumptions, we introduce a computational framework to design decision support systems based on prediction sets that are guaranteed to cause harm less frequently than a user-specified value using conformal risk control. We validate our framework using real human predictions from two different human subject studies and show that, in decision support systems based on prediction sets, there is a trade-off between accuracy and counterfactual harm.
翻译:基于预测集的决策支持系统通过将潜在标签值的集合缩小至其子集(即预测集),并要求人类始终从预测集中预测标签值,来协助人类解决多类别分类任务。尽管此类系统已被证明能有效提升人类预测的平均准确率,但通过限制人类自主性,它们可能造成危害——即原本能独立正确预测实例真实标签的人类,若使用此类系统反而可能预测失败。本文的目标是通过设计,控制基于预测集的决策支持系统造成危害的频率。为此,我们首先利用结构因果模型的理论框架对上述危害概念进行形式化表征。随后证明,在一个自然但不可验证的单调性假设下,我们可以仅利用人类独立完成的预测来估计系统可能造成危害的频率。进一步地,我们还证明,在一个更弱且可通过实验验证的单调性假设下,我们同样可以仅基于人类独立预测来界定系统造成危害的频率上限。基于这些假设,我们引入一个计算框架,利用共形风险控制来设计基于预测集的决策支持系统,确保其造成危害的频率低于用户指定阈值。我们通过两项不同人类受试者研究的真实预测数据验证了该框架,并表明在基于预测集的决策支持系统中,准确性与反事实危害之间存在权衡关系。