何时检查？多步骤AI代理任务中确认频率的决策理论模型 (When Should Users Check? A Decision-Theoretic Model of Confirmation Frequency in Multi-Step AI Agent Tasks)

Existing AI agents typically execute multi-step tasks autonomously and only allow user confirmation at the end. During execution, users have little control, making the confirm-at-end approach brittle: a single error can cascade and force a complete restart. Confirming every step avoids such failures, but imposes tedious overhead. Balancing excessive interruptions against costly rollbacks remains an open challenge. We address this problem by modeling confirmation as a minimum time scheduling problem. We conducted a formative study with eight participants, which revealed a recurring Confirmation-Diagnosis-Correction-Redo (CDCR) pattern in how users monitor errors. Based on this pattern, we developed a decision-theoretic model to determine time-efficient confirmation point placement. We then evaluated our approach using a within-subjects study where 48 participants monitored AI agents and repaired their mistakes while executing tasks. Results show that 81 percent of participants preferred our intermediate confirmation approach over the confirm-at-end approach used by existing systems, and task completion time was reduced by 13.54 percent.

翻译：现有AI代理通常自主执行多步骤任务，仅允许用户在任务结束时进行确认。在执行过程中，用户几乎无法控制，这使得"最终确认"方法显得脆弱：单个错误可能引发连锁反应，迫使任务完全重启。对每个步骤进行确认虽能避免此类故障，却会带来繁琐的操作负担。如何在过度中断与高代价回滚之间取得平衡，仍是一个悬而未决的挑战。我们通过将确认建模为最小时间调度问题来解决此问题。我们开展了包含八名参与者的形成性研究，发现用户在监控错误时存在一种反复出现的"确认-诊断-修正-重做"模式。基于此模式，我们开发了一个决策理论模型来确定时间最优的确认点设置方案。随后，我们通过一项被试内设计研究评估了该方法，48名参与者在执行任务时监控AI代理并修复其错误。结果显示，81%的参与者更倾向于我们提出的中间确认方法而非现有系统采用的最终确认方法，且任务完成时间减少了13.54%。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

代理式人工智能时代的决策优势

专知会员服务

28+阅读 · 2025年10月10日

《人工智能辅助决策中信任的时间演化》225页

专知会员服务

24+阅读 · 2025年5月12日

《你所需要知道的理论：人工智能、人类认知与决策》牛津大学最新53页报告

专知会员服务

90+阅读 · 2024年11月15日

《比较人工智能辅助决策与人类辅助决策之间信任的判断和时间演变》最新109页

专知会员服务

41+阅读 · 2024年10月15日