Belief Updating and Delegation in Multi-Task Human-AI Interaction: Evidence from Controlled Simulations

Large language models (LLMs) increasingly support heterogeneous tasks within a single interface, requiring users to form, update, and act upon beliefs about one system across domains with different reliability profiles. Understanding how such beliefs transfer across tasks and shape delegation is therefore critical for the design of multipurpose AI systems. We report a preregistered experiment (N=240; 7,200 trials) in which participants interacted with a controlled AI simulation across grammar checking, travel planning, and visual question answering, each with fixed, domain-typical accuracy levels. Delegation was operationalized as a binary reliance decision: accepting the AI's output versus acting independently, and belief dynamics were evaluated against Bayesian benchmarks. We find three main results. First, participants do not reset beliefs between tasks: priors in a new task depend on posteriors from the previous task, with a 10-point increase predicting a 3-4 point higher subsequent prior. Second, within tasks, belief updating follows the Bayesian direction but is substantially conservative, proceeding at roughly half the normative Bayesian rate. Third, delegation is driven primarily by subjective beliefs about AI accuracy rather than self-confidence, though confidence independently reduces reliance when beliefs are held constant. Together, these findings show that users form global, path-dependent expectations about multipurpose AI systems, update them conservatively, and rely on AI primarily based on subjective beliefs rather than objective performance. We discuss implications for expectation calibration, reliance design, and the risks of belief spillovers in deployed LLM-based interfaces.

翻译：大型语言模型（LLM）日益在单一界面中支持异构任务，这要求用户在不同可靠性特征的领域内，针对同一系统形成、更新信念并据此行动。因此，理解此类信念如何跨任务迁移并影响委托行为，对于多用途人工智能系统的设计至关重要。我们报告了一项预注册实验（N=240；7,200次试验），参与者在语法检查、旅行规划和视觉问答三个领域中与受控AI模拟系统交互，每个领域具有固定且符合领域典型特征的准确率水平。委托行为被操作化为二元依赖决策：接受AI输出或独立行动，信念动态则依据贝叶斯基准进行评估。我们得到三个主要结果：首先，参与者不会在任务间重置信念——新任务的先验信念取决于前一任务的后验信念，后验信念每增加10个百分点，后续先验信念会相应提高3-4个百分点。其次，在任务内部，信念更新遵循贝叶斯方向但明显趋于保守，更新速率约为规范贝叶斯速率的一半。第三，委托行为主要受对AI准确率的主观信念驱动而非自我信心，但当信念持平时，信心会独立降低依赖程度。综合来看，这些发现表明用户会形成全局性、路径依赖的多用途AI系统预期，以保守方式更新信念，并且主要依据主观信念而非客观性能来决定对AI的依赖。我们进一步讨论了这些发现对期望校准、依赖设计以及已部署LLM界面中信念溢出风险的启示。

相关内容

关注 7107

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

大语言模型智能体中的外显化机制：记忆、技能、协议与评测基准工程综述

专知会员服务

33+阅读 · 4月19日

迈向个性化大语言模型驱动的智能体：基础、评估与未来方向

专知会员服务

28+阅读 · 2月27日

法律领域中的大语言模型智能体：分类体系、应用场景与挑战

专知会员服务

17+阅读 · 1月14日

自进化人工智能体的全面综述：连接基础模型与终身自主智能系统的新范式

专知会员服务

35+阅读 · 2025年12月28日