As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels, without relying on covert communications, backdoors, or falsified documents. By exploiting LLMs' overthinking tendency, we formalize the first cognitive collusion attack and propose Generative Montage: a Writer-Editor-Director framework that constructs deceptive narratives through adversarial debate and coordinated posting of evidence fragments, causing victims to internalize and propagate fabricated conclusions. To study this risk, we develop CoPHEME, a dataset derived from real-world rumor events, and simulate attacks across diverse LLM families. Our results show pervasive vulnerability across 14 LLM families: attack success rates reach 74.4% for proprietary models and 70.6% for open-weights models. Counterintuitively, stronger reasoning capabilities increase susceptibility, with reasoning-specialized models showing higher attack success than base models or prompts. Furthermore, these false beliefs then cascade to downstream judges, achieving over 60% deception rates, highlighting a socio-technical vulnerability in how LLM-based agents interact with dynamic information environments. Our implementation and data are available at: https://github.com/CharlesJW222/Lying_with_Truth/tree/main.
翻译:随着大语言模型(LLMs)向自主智能体转变并实时整合信息,其推理能力引入了意想不到的攻击面。本文提出了一种新型威胁:共谋智能体仅通过公开渠道传播的真实证据片段,在不依赖隐蔽通信、后门或伪造文件的情况下,操纵受害者信念。通过利用LLMs的过度思考倾向,我们首次形式化了认知共谋攻击,并提出生成拼接(Generative Montage)框架——一种由写作者-编辑者-导演者(Writer-Editor-Director)协作的架构,通过对抗性辩论和证据片段协同发布构建欺骗性叙事,导致受害者内化并传播虚假结论。为研究这一风险,我们开发了基于现实谣言事件的CoPHEME数据集,并在多种LLM家族中模拟攻击。结果表明,14个LLM家族普遍存在脆弱性:专有模型攻击成功率达74.4%,开源权重模型达70.6%。反直觉的是,更强的推理能力反而增加了易感性,推理专用模型比基础模型或提示调整模型的攻击成功率更高。此外,这些错误信念会级联传播至下游判别器,实现超过60%的欺骗率,凸显了基于LLM的智能体与动态信息环境交互中的社会技术脆弱性。我们的实现与数据已开源:https://github.com/CharlesJW222/Lying_with_Truth/tree/main。