MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

The Model Context Protocol (MCP) standardizes how large language model (LLM) agents discover, describe, and call external tools. While MCP unlocks broad interoperability, it also enlarges the attack surface by making tools first-class, composable objects with natural-language metadata, and standardized I/O. We present MSB (MCP Security Benchmark), the first end-to-end evaluation suite that systematically measures how well LLM agents resist MCP-specific attacks throughout the full tool-use pipeline: task planning, tool invocation, and response handling. MSB contributes: (1) a taxonomy of 12 attacks including name-collision, preference manipulation, prompt injections embedded in tool descriptions, out-of-scope parameter requests, user-impersonating responses, false-error escalation, tool-transfer, retrieval injection, and mixed attacks; (2) an evaluation harness that executes attacks by running real tools (both benign and malicious) via MCP rather than simulation; and (3) a robustness metric that quantifies the trade-off between security and performance: Net Resilient Performance (NRP). We evaluate nine popular LLM agents across 10 domains and 405 tools, producing 2,000 attack instances. Results reveal the effectiveness of attacks against each stage of MCP. Models with stronger performance are more vulnerable to attacks due to their outstanding tool calling and instruction following capabilities. MSB provides a practical baseline for researchers and practitioners to study, compare, and harden MCP agents. Code: https://github.com/dongsenzhang/MSB

翻译：模型上下文协议（MCP）标准化了大语言模型（LLM）智能体对外部工具的发现、描述与调用过程。MCP在实现广泛互操作性的同时，通过将工具作为具有自然语言元数据和标准化输入/输出的一等可组合对象，也扩大了攻击面。我们提出MSB（MCP安全基准），这是首个端到端评估套件，系统性地衡量LLM智能体在整个工具使用流程（任务规划、工具调用和响应处理）中抵御MCP特定攻击的能力。MSB的主要贡献包括：（1）构建了包含12类攻击的分类体系，包括名称冲突、偏好操控、工具描述嵌入提示注入、越界参数请求、用户模拟响应、虚假错误升级、工具转移、检索注入以及混合攻击；（2）开发基于MCP协议（而非模拟）执行真实工具（包括良性工具与恶意工具）的评估框架；（3）提出量化安全与性能权衡的鲁棒性指标：净韧性性能（NRP）。我们评估了9个主流LLM智能体，覆盖10个领域和405个工具，生成2000个攻击实例。结果表明各类攻击对MCP各阶段均有效，且性能更强的模型因其卓越的工具调用和指令遵循能力反而更容易受攻击。MSB为研究者与从业者研究、比较和加固MCP智能体提供了实用基线。代码：https://github.com/dongsenzhang/MSB