Existing AI agents typically execute multi-step tasks autonomously and only allow user confirmation at the end. During execution, users have little control, making the confirm-at-end approach brittle: a single error can cascade and force a complete restart. Confirming every step avoids such failures, but imposes tedious overhead. Balancing excessive interruptions against costly rollbacks remains an open challenge. We address this problem by modeling confirmation as a minimum time scheduling problem. We conducted a formative study with eight participants, which revealed a recurring Confirmation-Diagnosis-Correction-Redo (CDCR) pattern in how users monitor errors. Based on this pattern, we developed a decision-theoretic model to determine time-efficient confirmation point placement. We then evaluated our approach using a within-subjects study where 48 participants monitored AI agents and repaired their mistakes while executing tasks. Results show that 81 percent of participants preferred our intermediate confirmation approach over the confirm-at-end approach used by existing systems, and task completion time was reduced by 13.54 percent.
翻译:现有AI代理通常自主执行多步骤任务,仅允许用户在任务结束时进行确认。在执行过程中,用户几乎无法控制,这使得"最终确认"方法显得脆弱:单个错误可能引发连锁反应,迫使任务完全重启。对每个步骤进行确认虽能避免此类故障,却会带来繁琐的操作负担。如何在过度中断与高代价回滚之间取得平衡,仍是一个悬而未决的挑战。我们通过将确认建模为最小时间调度问题来解决此问题。我们开展了包含八名参与者的形成性研究,发现用户在监控错误时存在一种反复出现的"确认-诊断-修正-重做"模式。基于此模式,我们开发了一个决策理论模型来确定时间最优的确认点设置方案。随后,我们通过一项被试内设计研究评估了该方法,48名参与者在执行任务时监控AI代理并修复其错误。结果显示,81%的参与者更倾向于我们提出的中间确认方法而非现有系统采用的最终确认方法,且任务完成时间减少了13.54%。