大型语言模型能有效说服人们相信阴谋论 (Large language models can effectively convince people to believe conspiracies)

Large language models (LLMs) have been shown to be persuasive across a variety of context. But it remains unclear whether this persuasive power advantages truth over falsehood, or if LLMs can promote misbeliefs just as easily as refuting them. Here, we investigate this question across three pre-registered experiments in which participants (N = 2,724 Americans) discussed a conspiracy theory they were uncertain about with GPT-4o, and the model was instructed to either argue against ("debunking") or for ("bunking") that conspiracy. When using a "jailbroken" GPT-4o variant with guardrails removed, the AI was as effective at increasing conspiracy belief as decreasing it. Concerningly, the bunking AI was rated more positively, and increased trust in AI, more than the debunking AI. Surprisingly, we found that using standard GPT-4o produced very similar effects, such that the guardrails imposed by OpenAI did little to revent the LLM from promoting conspiracy beliefs. Encouragingly, however, a corrective conversation reversed these newly induced conspiracy beliefs, and simply prompting GPT-4o to only use accurate information dramatically reduced its ability to increase conspiracy beliefs. Our findings demonstrate that LLMs possess potent abilities to promote both truth and falsehood, but that potential solutions may exist to help mitigate this risk.

翻译：大型语言模型（LLMs）已被证实在多种情境下具有说服力。但其说服力究竟是有利于真相还是谬误，以及LLMs是否能够像驳斥错误观点一样轻易地助长错误信念，目前尚不明确。本研究通过三项预先注册的实验（参与者为2,724名美国人）探讨了这一问题：参与者与GPT-4o讨论他们不确定的阴谋论，模型被指示要么反驳（"揭穿"）要么支持（"鼓吹"）该阴谋论。当使用移除防护机制的"越狱"版GPT-4o时，人工智能在增强阴谋论信念方面的效果与削弱信念的效果相当。令人担忧的是，鼓吹型AI比揭穿型AI获得了更积极的评价，并且更大程度地提升了人们对AI的信任度。出乎意料的是，标准版GPT-4o产生了非常相似的效果，表明OpenAI设置的防护机制几乎未能阻止LLM传播阴谋论信念。然而值得欣慰的是，纠正性对话能够逆转这些新诱导的阴谋论信念，且仅通过提示GPT-4o仅使用准确信息就能显著降低其增强阴谋论信念的能力。我们的研究结果表明，LLMs同时具备促进真相与谬误的强大能力，但可能存在潜在的解决方案来帮助缓解这种风险。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

评估大语言模型在科学发现中的作用

专知会员服务

19+阅读 · 2025年12月19日

高效大语言模型推理服务综述

专知会员服务

18+阅读 · 2025年4月30日

面向统计学家的大型语言模型概述

专知会员服务

32+阅读 · 2025年3月16日

通过逻辑推理赋能大语言模型：综述

专知会员服务

32+阅读 · 2025年2月24日