Recently, people have suffered and become increasingly aware of the unreliability gap in LLMs for open and knowledge-intensive tasks, and thus turn to search-augmented LLMs to mitigate this issue. However, when the search engine is triggered for harmful tasks, the outcome is no longer under the LLM's control. Once the returned content directly contains targeted, ready-to-use harmful takeaways, the LLM's safeguards cannot withdraw that exposure. Motivated by this dilemma, we identify web search as a critical attack surface and propose \textbf{\textit{SearchAttack}} for red-teaming. SearchAttack outsources the harmful semantics to web search, retaining only the query's skeleton and fragmented clues, and further steers LLMs to reconstruct the retrieved content via structural rubrics to achieve malicious goals. Extensive experiments are conducted to red-team the search-augmented LLMs for responsible vulnerability assessment. Empirically, SearchAttack demonstrates strong effectiveness in attacking these systems.
翻译:近年来,人们已遭受并日益认识到LLMs在开放性和知识密集型任务中存在的不可靠性鸿沟,因此转向搜索增强型LLMs以缓解此问题。然而,当搜索引擎被触发执行有害任务时,其结果便不再受LLM控制。一旦返回内容直接包含针对性、可直接使用的有害信息,LLM的安全防护机制将无法撤回此类暴露。基于此困境,我们将网络搜索识别为关键攻击面,并提出用于红队测试的\textbf{\textit{SearchAttack}}。该方法将有害语义外包给网络搜索,仅保留查询框架与碎片化线索,并通过结构化评估准则引导LLMs重构检索内容以实现恶意目标。我们进行了大量实验,对搜索增强型LLMs进行红队测试以进行负责任的安全漏洞评估。实证表明,SearchAttack在攻击此类系统方面展现出显著的有效性。