We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to predictive algorithms. We argue that this framing clarifies the problem of human-AI collaboration in prediction tasks, as experts often form judgments by drawing on information which is not encoded in an algorithm's training data. Algorithmic indistinguishability yields a natural test for assessing whether experts incorporate this kind of "side information", and further provides a simple but principled method for selectively incorporating human feedback into algorithmic predictions. We show that this method provably improves the performance of any feasible algorithmic predictor and precisely quantify this improvement. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly $30\%$ of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.
翻译:本文提出了一种将人类专业知识融入算法预测的新框架。该方法利用人类判断来区分算法无法区分的输入,即对预测算法而言"看似相同"的输入。我们认为这种框架能够澄清预测任务中人机协作的问题,因为专家通常依据算法训练数据中未编码的信息形成判断。算法不可区分性为评估专家是否整合此类"辅助信息"提供了天然检验标准,并进一步提供了一种简单而有原则的方法,能有选择地将人类反馈纳入算法预测。我们证明该方法可证明地改进任何可行算法预测器的性能,并精确量化这种改进。实证研究发现,虽然算法在整体表现上通常优于人类,但人类判断能在特定实例上改进算法预测(这些实例可事先识别)。在X射线分类任务中,我们发现这类实例约占患者群体的$30\%$。我们的方法为揭示这种异质性从而实现有效的人机协作提供了自然途径。