Topic-FlipRAG：面向主题的对抗性观点操控攻击对检索增强生成模型的影响 (Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models)

Retrieval-Augmented Generation (RAG) systems based on Large Language Models (LLMs) have become essential for tasks such as question answering and content generation. However, their increasing impact on public opinion and information dissemination has made them a critical focus for security research due to inherent vulnerabilities. Previous studies have predominantly addressed attacks targeting factual or single-query manipulations. In this paper, we address a more practical scenario: topic-oriented adversarial opinion manipulation attacks on RAG models, where LLMs are required to reason and synthesize multiple perspectives, rendering them particularly susceptible to systematic knowledge poisoning. Specifically, we propose Topic-FlipRAG, a two-stage manipulation attack pipeline that strategically crafts adversarial perturbations to influence opinions across related queries. This approach combines traditional adversarial ranking attack techniques and leverages the extensive internal relevant knowledge and reasoning capabilities of LLMs to execute semantic-level perturbations. Experiments show that the proposed attacks effectively shift the opinion of the model's outputs on specific topics, significantly impacting user information perception. Current mitigation methods cannot effectively defend against such attacks, highlighting the necessity for enhanced safeguards for RAG systems, and offering crucial insights for LLM security research.

翻译：基于大型语言模型（LLM）的检索增强生成（RAG）系统已成为问答和内容生成等任务的关键技术。然而，由于其固有的脆弱性，它们对公众舆论和信息传播日益增长的影响使其成为安全研究的重要焦点。以往的研究主要关注针对事实或单次查询操控的攻击。本文探讨了一种更实际的场景：针对RAG模型的、面向主题的对抗性观点操控攻击，其中LLM需要进行多视角推理与综合，使其特别容易受到系统性知识投毒的影响。具体而言，我们提出了Topic-FlipRAG，一种两阶段的操控攻击流程，通过策略性地构建对抗性扰动来影响相关查询的观点。该方法结合了传统的对抗性排序攻击技术，并利用LLM内部丰富的相关知识与推理能力来执行语义层面的扰动。实验表明，所提出的攻击能有效改变模型在特定主题上输出的观点，显著影响用户的信息感知。现有的防御方法无法有效抵御此类攻击，这凸显了增强RAG系统安全防护的必要性，并为LLM安全研究提供了重要启示。