Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces a new threat surface: unreliable search results can mislead agents into producing unsafe outputs. Real-world incidents and our two in-the-wild observations show that such failures can occur in practice. To study this threat systematically, we propose SafeSearch, an automated red-teaming framework that is scalable, cost-efficient, and lightweight, enabling sandboxed safety evaluation of search agents. Using this, we generate 300 test cases spanning five risk categories (e.g., misinformation and prompt injection) and evaluate three search agent scaffolds across 17 representative LLMs. Our results reveal substantial vulnerabilities in LLM-based search agents, with the highest ASR reaching 90.5% for GPT-4.1-mini in a search-workflow setting. Moreover, we find that common defenses, such as reminder prompting, offer limited protection. Overall, SafeSearch provides a practical way to measure and improve the safety of LLM-based search agents.
翻译:搜索代理将LLM与互联网相连,使其能够访问更广泛、更及时的信息。然而,这也引入了一个新的威胁面:不可靠的搜索结果可能误导代理产生不安全的输出。现实世界的事件以及我们两次实地观察表明,此类失败在实践中可能发生。为系统研究该威胁,我们提出SafeSearch——一个可扩展、成本高效且轻量级的自动化红队测试框架,能够对搜索代理进行沙盒化安全评估。利用该框架,我们生成了涵盖五个风险类别(如错误信息和提示注入)的300个测试用例,并评估了三种搜索代理架构在17个具有代表性的LLM上的表现。我们的结果显示,基于LLM的搜索代理存在显著漏洞,其中GPT-4.1-mini在搜索工作流场景下的最高ASR达到90.5%。此外,我们发现常见的防御措施(如提示提醒)提供的保护有限。总体而言,SafeSearch为衡量和提升基于LLM的搜索代理的安全性提供了一种实用方法。