AI Uncertainty Quantification (UQ) has the potential to improve human decision-making beyond AI predictions alone by providing additional useful probabilistic information to users. The majority of past research on AI and human decision-making has concentrated on model explainability and interpretability. We implemented instance-based UQ for three real datasets. To achieve this, we trained different AI models for classification for each dataset, and used random samples generated around the neighborhood of the given instance to create confidence intervals for UQ. The computed UQ was calibrated using a strictly proper scoring rule as a form of quality assurance for UQ. We then conducted two preregistered online behavioral experiments that compared objective human decision-making performance under different AI information conditions, including UQ. In Experiment 1, we compared decision-making for no AI (control), AI prediction alone, and AI prediction with a visualization of UQ. We found UQ significantly improved decision-making beyond the other two conditions. In Experiment 2, we focused on comparing different representations of UQ information: Point vs. distribution of uncertainty and visualization type (needle vs. dotplot). We did not find meaningful differences in decision-making performance among these different representations of UQ. Overall, our results indicate that human decision-making can be improved by providing UQ information along with AI predictions, and that this benefit generalizes across a variety of representations of UQ.
翻译:人工智能不确定性量化(UQ)具有在单纯AI预测之外向用户提供额外有用概率信息、从而改善人类决策的潜力。以往关于AI与人类决策的研究主要集中在模型可解释性和可理解性上。我们对三个真实数据集实施了基于实例的不确定性量化。为此,我们为每个数据集训练了不同的AI分类模型,并利用给定实例邻域生成的随机样本为UQ构建置信区间。通过严格适当的评分规则对计算出的UQ进行校准,作为UQ质量保证的一种形式。随后我们开展了两项预注册的在线行为实验,比较了不同AI信息条件(包括UQ)下客观人类决策的表现。实验1比较了无AI(对照组)、仅AI预测以及AI预测结合UQ可视化三种条件下的决策表现,发现UQ显著改善了决策效果,优于其他两种条件。实验2聚焦于比较UQ信息的不同呈现方式:不确定性点估计与分布呈现,以及可视化类型(指针图与点图)。在这些不同的UQ呈现方式中,我们未发现决策表现存在显著差异。总体而言,我们的结果表明,在AI预测基础上提供UQ信息可以改善人类决策,且这一益处普遍适用于多种UQ呈现形式。