Large Language Models (LLMs) are increasingly integrated into real-world applications via the Model Context Protocol (MCP), a universal open standard for connecting AI agents with data sources and external tools. While MCP enhances the capabilities of LLM-based agents, it also introduces new security risks and significantly expands their attack surface. In this paper, we present the first formalization of a secure MCP and its required specifications. Based on this foundation, we establish a comprehensive MCP security taxonomy that extends existing models by incorporating protocol-level and host-side threats, identifying 17 distinct attack types across four primary attack surfaces. Building on these specifications, we introduce MCPSecBench, a systematic security benchmark and playground that integrates prompt datasets, MCP servers, MCP clients, attack scripts, a GUI test harness, and protection mechanisms to evaluate these threats across three major MCP platforms. MCPSecBench is designed to be modular and extensible, allowing researchers to incorporate custom implementations of clients, servers, and transport protocols for rigorous assessment. Our evaluation across three major MCP platforms reveals that all attack surfaces yield successful compromises. Core vulnerabilities universally affect Claude, OpenAI, and Cursor, while server-side and specific client-side attacks exhibit considerable variability across different hosts and models. Furthermore, current protection mechanisms proved largely ineffective, achieving an average success rate of less than 30%. Overall, MCPSecBench standardizes the evaluation of MCP security and enables rigorous testing across all protocol layers.
翻译:大型语言模型(LLM)正日益通过模型上下文协议(MCP)集成到实际应用中,该协议是连接AI智能体与数据源及外部工具的通用开放标准。尽管MCP增强了基于LLM的智能体能力,但也引入了新的安全风险并显著扩大了其攻击面。本文首次对安全MCP及其必要规范进行了形式化定义。在此基础上,我们建立了一个全面的MCP安全分类体系,通过纳入协议层和主机端威胁扩展了现有模型,在四个主要攻击面上识别出17种不同的攻击类型。基于这些规范,我们提出了MCPSecBench——一个系统化的安全基准与测试平台,集成了提示数据集、MCP服务器、MCP客户端、攻击脚本、图形界面测试框架及防护机制,用于在三大主流MCP平台上评估这些威胁。MCPSecBench采用模块化可扩展设计,支持研究人员集成自定义的客户端、服务器和传输协议实现以进行严格评估。我们在三大主流MCP平台上的评估结果表明:所有攻击面均存在成功入侵案例。核心漏洞普遍影响Claude、OpenAI和Cursor平台,而服务器端及特定客户端攻击在不同主机和模型间表现出显著差异。此外,现有防护机制被证明基本无效,平均成功率低于30%。总体而言,MCPSecBench标准化了MCP安全评估流程,并支持对所有协议层进行严格测试。