Large language models (LLMs) increasingly support heterogeneous tasks within a single interface, requiring users to form, update, and act upon beliefs about one system across domains with different reliability profiles. Understanding how such beliefs transfer across tasks and shape delegation is therefore critical for the design of multipurpose AI systems. We report a preregistered experiment (N=240; 7,200 trials) in which participants interacted with a controlled AI simulation across grammar checking, travel planning, and visual question answering, each with fixed, domain-typical accuracy levels. Delegation was operationalized as a binary reliance decision: accepting the AI's output versus acting independently, and belief dynamics were evaluated against Bayesian benchmarks. We find three main results. First, participants do not reset beliefs between tasks: priors in a new task depend on posteriors from the previous task, with a 10-point increase predicting a 3-4 point higher subsequent prior. Second, within tasks, belief updating follows the Bayesian direction but is substantially conservative, proceeding at roughly half the normative Bayesian rate. Third, delegation is driven primarily by subjective beliefs about AI accuracy rather than self-confidence, though confidence independently reduces reliance when beliefs are held constant. Together, these findings show that users form global, path-dependent expectations about multipurpose AI systems, update them conservatively, and rely on AI primarily based on subjective beliefs rather than objective performance. We discuss implications for expectation calibration, reliance design, and the risks of belief spillovers in deployed LLM-based interfaces.
翻译:大型语言模型(LLM)日益在单一界面中支持异构任务,这要求用户在不同可靠性特征的领域内,针对同一系统形成、更新信念并据此行动。因此,理解此类信念如何跨任务迁移并影响委托行为,对于多用途人工智能系统的设计至关重要。我们报告了一项预注册实验(N=240;7,200次试验),参与者在语法检查、旅行规划和视觉问答三个领域中与受控AI模拟系统交互,每个领域具有固定且符合领域典型特征的准确率水平。委托行为被操作化为二元依赖决策:接受AI输出或独立行动,信念动态则依据贝叶斯基准进行评估。我们得到三个主要结果:首先,参与者不会在任务间重置信念——新任务的先验信念取决于前一任务的后验信念,后验信念每增加10个百分点,后续先验信念会相应提高3-4个百分点。其次,在任务内部,信念更新遵循贝叶斯方向但明显趋于保守,更新速率约为规范贝叶斯速率的一半。第三,委托行为主要受对AI准确率的主观信念驱动而非自我信心,但当信念持平时,信心会独立降低依赖程度。综合来看,这些发现表明用户会形成全局性、路径依赖的多用途AI系统预期,以保守方式更新信念,并且主要依据主观信念而非客观性能来决定对AI的依赖。我们进一步讨论了这些发现对期望校准、依赖设计以及已部署LLM界面中信念溢出风险的启示。