Human-AI collaboration is often proposed to improve high-stakes decision-making, yet the influence of increased stakes and imperfect AI on decision-making strategies is not fully understood. Studying such behavior in realistic settings is challenging, as application-grounded evaluations are costly, rely on experts, or lack meaningful consequences for decision errors. To address this, we introduce Blockies, a parametric dataset generator for visual diagnostic tasks, and conduct an empirical study examining how perceived stakes influence reliance calibration and behavior. Results show that raised stakes lead to longer deliberation, but less calibrated reliance, with participants increasingly deferring to incorrect AI advice as decision time increased. These findings highlight that increased effort under higher stakes does not necessarily improve reliance calibration and show the importance of accounting for stakes when evaluating human-AI decision-making.
翻译:人类-AI协作常被提议用于改善高风险决策,但提高赌注和不完美的AI对决策策略的影响尚未完全明晰。由于在现实环境中研究此类行为极具挑战性——基于应用场景的评估成本高昂、依赖专家,或缺乏对决策错误的有意义后果,为此,我们引入了Blockies——一种用于视觉诊断任务的参数化数据集生成器,并开展了一项实证研究,考察感知赌注如何影响依赖校准与行为。结果表明,提高赌注会导致更长的决策思考时间,但依赖校准程度降低,且随着决策时间增加,参与者更频繁地采纳错误的AI建议。这些发现强调了在高赌注下增加努力并不必然改善依赖校准,并揭示了在评估人类-AI决策时考虑赌注因素的重要性。