People increasingly seek personal advice from large language models (LLMs), yet whether humans follow their advice, and its consequences for their well-being, remains unknown. In a longitudinal randomised controlled trial with a representative UK sample (N = 6,474), we found that up to 79% of participants who had a 20-minute discussion with one of three AI chatbots (GPT-4o, LLama-3.3-70B, Gemini 3 Pro) about health, careers or relationships subsequently reported following its advice. Advice-following remained above 60% even for high-stakes recommendations, suggesting that users only weakly calibrate their reliance on AI advice to potential consequences. Based on autograder evaluations of chat transcripts, LLM advice rarely violated safety best practice. However, when queried 2-3 weeks later, participants receiving personal advice from AI showed no sustained well-being benefits compared to a control group who discussed hobbies and interests with the same chatbots. These findings reveal that consumer LLMs exert substantial influence over real-world personal decisions without delivering measurable psychological benefits.
翻译:人们越来越多地向大型语言模型(LLMs)寻求个人建议,但人类是否会遵循这些建议,以及这对他们的幸福感有何影响,目前尚不明确。在一项针对具有代表性的英国样本(N = 6,474)的纵向随机对照试验中,我们发现,与三款AI聊天机器人(GPT-4o、LLama-3.3-70B、Gemini 3 Pro)就健康、职业或人际关系进行20分钟讨论的参与者中,高达79%的人后来表示遵循了其建议。即使对于高风险建议,遵循率也保持在60%以上,表明用户对AI建议的依赖程度与潜在后果之间的校准非常微弱。基于对聊天记录进行自动评分器评估,LLM的建议很少违反安全最佳实践。然而,在两到三周后的回访中,与对照组(使用相同聊天机器人讨论爱好和兴趣)相比,接受AI个人建议的参与者在幸福感方面并未显示持久的改善。这些发现表明,消费级LLMs对现实世界中的个人决策产生了显著影响,但并未带来可衡量的心理益处。