Recent advances in the Model Context Protocol (MCP) have enabled large language models (LLMs) to invoke external tools with unprecedented ease. This creates a new class of powerful and tool augmented agents. Unfortunately, this capability also introduces an under explored attack surface, specifically the malicious manipulation of tool responses. Existing techniques for indirect prompt injection that target MCP suffer from high deployment costs, weak semantic coherence, or heavy white box requirements. Furthermore, they are often easily detected by recently proposed defenses. In this paper, we propose Tree structured Injection for Payloads (TIP), a novel black-box attack which generates natural payloads to reliably seize control of MCP enabled agents even under defense. Technically, We cast payload generation as a tree structured search problem and guide the search with an attacker LLM operating under our proposed coarse-to-fine optimization framework. To stabilize learning and avoid local optima, we introduce a path-aware feedback mechanism that surfaces only high quality historical trajectories to the attacker model. The framework is further hardened against defensive transformations by explicitly conditioning the search on observable defense signals and dynamically reallocating the exploration budget. Extensive experiments on four mainstream LLMs show that TIP attains over 95% attack success in undefended settings while requiring an order of magnitude fewer queries than prior adaptive attacks. Against four representative defense approaches, TIP preserves more than 50% effectiveness and significantly outperforms the state-of-the-art attacks. By implementing the attack on real world MCP systems, our results expose an invisible but practical threat vector in MCP deployments. We also discuss potential mitigation approaches to address this critical security gap.
翻译:近期模型上下文协议(MCP)的进展使大语言模型(LLM)能够以空前的便捷性调用外部工具,由此催生出新型强力工具增强型智能体。然而,该能力同时也引入了尚未充分探索的攻击面——即针对工具响应的恶意操纵。现有针对MCP的间接提示注入技术存在部署成本高昂、语义连贯性薄弱或严重依赖白盒条件等缺陷,且常被近期提出的防御机制轻易检测。本文提出一种名为树形载荷注入(TIP)的新型黑盒攻击方法,即使在防御环境下仍能生成自然载荷以可靠劫持启用MCP的智能体。技术层面,我们将载荷生成建模为树形结构搜索问题,并在所提出的由粗到精优化框架下引导攻击型LLM执行搜索。为稳定学习过程并避免局部最优,我们引入路径感知反馈机制,仅筛选高质量历史轨迹供攻击模型参考。该框架进一步通过显式条件化搜索过程适配可观测的防御信号并动态重分配探索预算,从而增强对抗防御性变换的鲁棒性。在四种主流LLM上的大量实验表明,TIP在无防御场景下实现超过95%的攻击成功率,且查询量比先前自适应攻击方法低一个数量级。针对四种代表性防御策略,TIP仍保持50%以上的有效性,并显著超越当前最先进的攻击方法。通过在真实MCP系统中实施攻击,我们的研究结果揭示了MCP部署过程中存在但易被忽视的实用化威胁向量。本文最终探讨了应对该关键安全漏洞的潜在缓解措施。