We study models for human-AI teaming through the lens of statistical calibration. We assume the team consists of an AI model and human -- both of which are calibrated with respect to some partitioning of the feature space -- and expose how the calibration assumptions propagate into the teaming framework. In particular, we consider frameworks that either (i) combine human and model predictions or (ii) delegate prediction responsibility to either a human or model. We show via theoretical and empirical results that existing methods for combination do not preserve the human's degree of calibration. Methods for delegation (by the very act of delegation) preserve calibration of the downstream predictors but shift the burden onto the rejector meta-model that decides who predicts. The rejector must be calibrated finely enough to locate where each member is superior, a demand that grows with the human's expertise and becomes unattainable when the human relies on information the system cannot observe.
翻译:我们通过统计校准的视角研究人机协作团队的模型。假设团队由一个人工智能模型和人类组成——两者均相对于特征空间的某种划分实现校准——并揭示校准假设如何传播到协作框架中。具体而言,我们考虑两种框架:(i) 结合人类与模型预测的框架,或 (ii) 将预测责任委托给人类或模型的框架。通过理论与实证结果证明,现有的组合方法无法保持人类的校准程度。而委托方法(通过委托行为本身)能够保持下游预测器的校准性,但会将负担转移到决定由谁进行预测的拒绝元模型上。该拒绝元模型必须足够精细地校准,以定位每个成员具有优势的区域——这种需求会随着人类专业知识的增长而增加,当人类依赖系统无法观测的信息时,这种要求将变得不可实现。