Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhibit agentic overconfidence: some agents that succeed only 22% of the time predict 77% success. Counterintuitively, pre-execution assessment with strictly less information tends to yield better discrimination than standard post-execution review, though differences are not always significant. Adversarial prompting reframing assessment as bug-finding achieves the best calibration.
翻译:人工智能智能体能否预测自身在任务中的成功概率?我们通过在执行前、执行中和执行后获取成功概率估计值来研究智能体不确定性。所有结果均显示出智能体过度自信现象:某些实际成功率仅为22%的智能体预测成功概率高达77%。与直觉相反的是,尽管差异并非总是显著,但信息严格更少的执行前评估往往比标准的执行后评估具有更好的区分能力。将评估重构为错误查找的对抗性提示方法实现了最佳校准效果。