Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces a new threat surface: unreliable search results can mislead agents into producing unsafe outputs. Real-world incidents and our two in-the-wild observations show that such failures can occur in practice. To study this threat systematically, we propose SafeSearch, an automated red-teaming framework that is scalable, cost-efficient, and lightweight, enabling sandboxed safety evaluation of search agents. Using this, we generate 300 test cases spanning five risk categories (e.g., misinformation and prompt injection) and evaluate three search agent scaffolds across 17 representative LLMs. Our results reveal substantial vulnerabilities in LLM-based search agents, with the highest ASR reaching 90.5% for GPT-4.1-mini in a search-workflow setting. Moreover, we find that common defenses, such as reminder prompting, offer limited protection. Overall, SafeSearch provides a practical way to measure and improve the safety of LLM-based search agents.
翻译:搜索代理将大语言模型与互联网相连,使其能够访问更广泛、更实时的信息。然而,这也引入了新的威胁面:不可靠的搜索结果可能误导代理产生不安全输出。现实世界的事件以及我们的两项实地观察表明,此类故障在实践中确实可能发生。为系统研究这一威胁,我们提出SafeSearch——一个可扩展、高性价比且轻量化的自动化红队测试框架,支持对搜索代理进行沙盒化安全评估。利用该框架,我们生成了涵盖五大风险类别(如虚假信息与提示注入)的300个测试用例,并评估了17个代表性大语言模型上的三种搜索代理架构。结果揭示基于大语言模型的搜索代理存在显著脆弱性,其中GPT-4.1-mini在搜索工作流场景下的最高攻击成功率达到90.5%。此外,我们发现常见防御手段(如提示提醒)的保护作用有限。总体而言,SafeSearch为衡量和提升基于大语言模型的搜索代理安全性提供了实用方案。