AI systems are fallible, and humans can make mistakes in deciding whether to trust AI over their own judgment. Thus, improving human-AI collaboration requires understanding when, why, and how humans decide to rely on AI. We study two distinct reliance decisions: the delegation choice -- deciding when to let AI act autonomously without knowing its output, and the adoption choice -- evaluating AI suggestions and deciding how to use them. Both of these decoupled reliance patterns shape collaboration, but prior work rarely studies them together in realistic settings with the same users. We address this gap by studying collaborative human--AI teams competing in a question-answering game in which humans can choose when and how to work with AI agents to win. Our 24 matches pair 23 expert humans with 16 AI agents, capturing 387 delegation and 1440 adoption decisions. While human--AI collaboration performs better than either AI or humans alone, humans make suboptimal collaboration decisions, both under-relying on correct AI suggestions (3.9% of opportunities missed) and over-relying when AI misleads them (1.7%). Both parties contribute wrong answers: reported model confidence is near chance when humans and AI disagree, while confirmation bias drives higher under-reliance (64.5%) when an AI suggestion agrees with humans' initial incorrect answer. To close this gap, we recommend calibrated confidence, evidence-grounded explanations, and mechanisms that help users refine trust.
翻译:AI系统并非完美无缺,而人类在决定是否信任AI而非自身判断时也可能犯错。因此,改善人机协作需要理解人类在何时、因何原因以及如何决定依赖AI。我们研究了两种不同的依赖决策:委托选择——在不知道AI输出结果的情况下决定何时让AI自主行动;以及采纳选择——评估AI建议并决定如何使用。这两种解耦的依赖模式共同塑造了协作,但以往研究很少在真实场景中针对同一用户群体同时探讨二者。为填补这一空白,我们通过一项问答游戏研究了协作型人机团队——在该游戏中,人类可以自主选择何时以及如何与AI智能体协作以获胜。我们的24场对局将23位人类专家与16个AI智能体配对,捕捉了387次委托决策和1440次采纳决策。尽管人机协作的表现优于单独使用AI或人类,但人类会做出次优协作决策:既包括对正确AI建议的依赖不足(错失3.9%的机会),也包括当AI误导时过度依赖(1.7%)。双方均会贡献错误答案:当人类与AI意见分歧时,报告模型置信度近乎随机;而当AI建议与人类初始错误答案一致时,确认偏见驱动了更高的依赖不足(64.5%)。为缩小这一差距,我们建议采用校准后的置信度、基于证据的解释,以及帮助用户优化信任的机制。