Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely focusing on individual queries or narrow topic-local query sets, which limits their practical reach and offers limited camouflage in real-world settings. In this paper, we introduce discourse-level opinion manipulation, a new threat model in which coordinated influence across a semantic query network induces opinion shifts over a holistic, multi-topic query space. We formalize this threat in a black-box setting and propose DiscourseFlip, an agentic, graph-guided attack that dynamically allocates a limited poisoning budget to maximize discourse-level opinion deviation. Extensive experiments demonstrate that DiscourseFlip consistently induces targeted opinion shifts across the contextualized query network and significantly outperforms existing baselines in terms of coverage and effectiveness. User studies further confirm that DiscourseFlip is effective while remaining well camouflaged from user detection. Moreover, systematic analyses show that existing mitigation strategies are ineffective against discourse-level manipulation, underscoring the urgent need for more robust and adaptive defenses to address discourse-level vulnerabilities.
翻译:检索增强生成系统被广泛部署且影响力日益增强,但其对外部语料库的依赖也暴露出因检索内容污染而带来的新型安全风险。现有针对RAG的攻击大多聚焦于单个查询或狭窄主题的局部查询集合,这限制了其实际影响范围,并在真实场景中仅提供有限的伪装能力。本文提出语篇级观点操纵这一新型威胁模型——通过对语义查询网络的协同影响,在全局多主题查询空间上诱导观点偏移。我们在黑盒设定下形式化该威胁,并提出了DiscourseFlip——一种基于智能体与图引导的攻击方法,能够动态分配有限投毒预算以最大化语篇级观点偏差。大量实验表明,DiscourseFlip能在上下文化查询网络中持续诱导目标观点偏移,并在覆盖范围与有效性上显著优于现有基线方法。用户研究进一步证实,DiscourseFlip在保持有效性的同时,能有效规避用户检测。此外,系统分析显示现有防御策略对语篇级操纵收效甚微,这凸显了构建更鲁棒自适应防御机制以应对语篇级漏洞的迫切需求。