Large language models (LLMs) have been shown to be persuasive across a variety of context. But it remains unclear whether this persuasive power advantages truth over falsehood, or if LLMs can promote misbeliefs just as easily as refuting them. Here, we investigate this question across three pre-registered experiments in which participants (N = 2,724 Americans) discussed a conspiracy theory they were uncertain about with GPT-4o, and the model was instructed to either argue against ("debunking") or for ("bunking") that conspiracy. When using a "jailbroken" GPT-4o variant with guardrails removed, the AI was as effective at increasing conspiracy belief as decreasing it. Concerningly, the bunking AI was rated more positively, and increased trust in AI, more than the debunking AI. Surprisingly, we found that using standard GPT-4o produced very similar effects, such that the guardrails imposed by OpenAI did little to revent the LLM from promoting conspiracy beliefs. Encouragingly, however, a corrective conversation reversed these newly induced conspiracy beliefs, and simply prompting GPT-4o to only use accurate information dramatically reduced its ability to increase conspiracy beliefs. Our findings demonstrate that LLMs possess potent abilities to promote both truth and falsehood, but that potential solutions may exist to help mitigate this risk.
翻译:大型语言模型(LLMs)已被证实在多种情境下具有说服力。但其说服力究竟是有利于真相还是谬误,以及LLMs是否能够像驳斥错误观点一样轻易地助长错误信念,目前尚不明确。本研究通过三项预先注册的实验(参与者为2,724名美国人)探讨了这一问题:参与者与GPT-4o讨论他们不确定的阴谋论,模型被指示要么反驳("揭穿")要么支持("鼓吹")该阴谋论。当使用移除防护机制的"越狱"版GPT-4o时,人工智能在增强阴谋论信念方面的效果与削弱信念的效果相当。令人担忧的是,鼓吹型AI比揭穿型AI获得了更积极的评价,并且更大程度地提升了人们对AI的信任度。出乎意料的是,标准版GPT-4o产生了非常相似的效果,表明OpenAI设置的防护机制几乎未能阻止LLM传播阴谋论信念。然而值得欣慰的是,纠正性对话能够逆转这些新诱导的阴谋论信念,且仅通过提示GPT-4o仅使用准确信息就能显著降低其增强阴谋论信念的能力。我们的研究结果表明,LLMs同时具备促进真相与谬误的强大能力,但可能存在潜在的解决方案来帮助缓解这种风险。