Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This allows us to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we can access both the human and the ML model predictions for the deferred instances. In such a case, we can identify the individual causal effects for deferred instances and aggregates of them. In the second scenario, only human predictions are available for the deferred instances. In this case, we can resort to regression discontinuity design to estimate a local causal effect. We empirically evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.
翻译:延迟系统通过将预测任务转交给人类专家,扩展了监督机器学习(ML)模型的功能。然而,评估延迟策略对系统准确性的影响仍是一个被忽视的领域。本文通过因果视角评估延迟系统,填补了这一空白。我们将因果推断的潜在结果框架与延迟系统联系起来,从而能够识别延迟策略对预测准确性的因果影响。我们区分了两种场景。在第一种场景中,我们可以同时获取延迟实例的人类预测和ML模型预测。在这种情况下,我们可以识别延迟实例的个体因果效应及其聚合效应。在第二种场景中,延迟实例仅有人类预测可用。此时,我们可以采用回归断点设计来估计局部因果效应。我们在合成数据集和真实数据集上,对文献中的七种延迟系统进行了实证评估。