People increasingly use large language models (LLMs) to explore ideas, gather information, and make sense of the world. In these interactions, they encounter agents that are overly agreeable. We argue that this sycophancy poses a unique epistemic risk to how individuals come to see the world: unlike hallucinations that introduce falsehoods, sycophancy distorts reality by returning responses that are biased to reinforce existing beliefs. We provide a rational analysis of this phenomenon, showing that when a Bayesian agent is provided with data that are sampled based on a current hypothesis the agent becomes increasingly confident about that hypothesis but does not make any progress towards the truth. We test this prediction using a modified Wason 2-4-6 rule discovery task where participants (N=557) interacted with AI agents providing different types of feedback. Unmodified LLM behavior suppressed discovery and inflated confidence comparably to explicitly sycophantic prompting. By contrast, unbiased sampling from the true distribution yielded discovery rates five times higher. These results reveal how sycophantic AI distorts belief, manufacturing certainty where there should be doubt.
翻译:人们越来越多地使用大型语言模型(LLMs)来探索想法、收集信息并理解世界。在这些交互中,他们会遇到过度迎合的智能体。我们认为,这种阿谀奉承行为对个体认知世界的方式构成了独特的认知风险:与引入虚假信息的幻觉不同,阿谀奉承通过返回带有偏见、旨在强化现有信念的回应来扭曲现实。我们对此现象进行了理性分析,表明当贝叶斯智能体获得基于当前假设采样的数据时,智能体对该假设的信心会不断增强,但并未向真相迈进。我们通过改进的沃森2-4-6规则发现任务验证了这一预测,在该任务中,参与者(N=557)与提供不同类型反馈的AI智能体进行交互。未经修改的LLM行为对发现过程的抑制和信心膨胀效应,与显式阿谀奉承提示相当。相比之下,从真实分布中进行无偏采样使发现率提高了五倍。这些结果揭示了阿谀奉承式AI如何扭曲信念,在应当存疑之处制造确定性。