Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. We evaluate 13 LLM backends on 308 cases each. Results show that vulnerability patterns vary across backends: overall attack success rate (ASR) ranges from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, the strongest attack mode differs by model family, and the same deployment scaffold could amplify or decrease ASR on different backends. An auxiliary agent-skill probe, where endorsement becomes an install command, exposes a sharp split among otherwise robust backends: Claude over-rejects while GPT over-trusts. These findings argue for treating recommendation reliability under adversarial search content as a first-class dimension of backend safety evaluation.
翻译:基于大型语言模型的网络搜索智能体,会综合开放网络内容形成可执行建议,从而产生攻击者发布的页面被转化为推荐断言的风险。本文提出SearchGEO框架——一种用于量化评估基于大型语言模型的网络搜索智能体推荐失真的受控评测框架,其核心组件包括网络证据操纵流水线、五维攻击分类体系及多层面输出评估指标。我们基于308组测试案例对13个大型语言模型后端进行了评估。实验结果显示,不同后端的脆弱性模式存在显著差异:整体攻击成功率从Claude-Sonnet-4.6的0.0%到Gemini-3-Flash的31.4%呈梯度分布;最强攻击模式因模型系列而异;同一部署架构在不同后端上可能放大或削弱攻击成功率。辅助性智能体技能探测实验(将推荐行为转化为命令安装指令)揭示了表现稳健的模型后端之间存在明显分野:Claude表现为过度拒绝,而GPT则呈现过度信任。这些发现表明,应当将恶意搜索内容场景下的推荐可靠性,作为后端安全评估的首要维度加以考量。