This paper proposes a statistical framework with which artificial intelligence can improve human decision making. The performance of each human decision maker is first benchmarked against machine predictions; we then replace the decisions made by a subset of the decision makers with the recommendation from the proposed artificial intelligence algorithm. Using a large nationwide dataset of pregnancy outcomes and doctor diagnoses from prepregnancy checkups of reproductive age couples, we experimented with both a heuristic frequentist approach and a Bayesian posterior loss function approach with an application to abnormal birth detection. We find that our algorithm on a test dataset results in a higher overall true positive rate and a lower false positive rate than the diagnoses made by doctors only. We also find that the diagnoses of doctors from rural areas are more frequently replaceable, suggesting that artificial intelligence assisted decision making tends to improve precision more in less developed regions.
翻译:本文提出一个统计框架,使人工智能能够改进人类决策。首先以机器预测为基准评估每位人类决策者的表现;随后我们替换部分决策者的决策,代之以所提出的人工智能算法的建议。利用一个覆盖全国的大规模数据集(包含育龄夫妇孕前检查的妊娠结局及医生诊断),我们分别采用启发式频率学派方法和贝叶斯后验损失函数方法进行实验,并将其应用于异常出生检测。结果表明,相较于仅由医生做出的诊断,我们的算法在测试数据集上实现了更高的总体真阳性率和更低的假阳性率。此外,我们发现来自农村地区的医生诊断被替换的频率更高,这表明人工智能辅助决策更倾向于改善欠发达地区的判断精度。