A Formal Security Framework for MCP-Based AI Agents: Threat Taxonomy, Verification Models, and Defense Mechanisms

The Model Context Protocol (MCP), introduced by Anthropic in November 2024 and now governed by the Linux Foundation's Agentic AI Foundation, has rapidly become the de facto standard for connecting large language model (LLM)-based agents to external tools and data sources, with over 97 million monthly SDK downloads and more than 177000 registered tools. However, this explosive adoption has exposed a critical gap: the absence of a unified, formal security framework capable of systematically characterizing, analyzing, and mitigating the diverse threats facing MCP-based agent ecosystems. Existing security research remains fragmented across individual attack papers, isolated benchmarks, and point defense mechanisms. This paper presents MCPSHIELD, a comprehensive formal security framework for MCP-based AI agents. We make four principal contributions: (1) a hierarchical threat taxonomy comprising 7 threat categories and 23 distinct attack vectors organized across four attack surfaces, grounded in the analysis of over 177000 MCP tools; (2) a formal verification model based on labeled transition systems with trust boundary annotations that enables static and runtime analysis of MCP tool interaction chains; (3) a systematic comparative evaluation of 12 existing defense mechanisms, identifying coverage gaps across our threat taxonomy; and (4) a defense in depth reference architecture integrating capability based access control, cryptographic tool attestation, information flow tracking, and runtime policy enforcement. Our analysis reveals that no existing single defense covers more than 34 percent of the identified threat landscape, whereas MCPSHIELD's integrated architecture achieves theoretical coverage of 91 percent. We further identify seven open research challenges that must be addressed to secure the next generation of agentic AI systems.

翻译：模型上下文协议（MCP）由Anthropic于2024年11月提出，现由Linux基金会旗下的Agentic AI基金会管理，已迅速成为连接基于大型语言模型（LLM）的代理与外部工具及数据源的事实标准——其SDK月下载量超9700万次，注册工具超17.7万个。然而，这种爆炸式采用暴露出一个关键缺口：缺乏能够系统化表征、分析和缓解MCP代理生态系统面临的多样化威胁的统一形式化安全框架。现有安全研究仍零散分布于个别攻击论文、孤立基准测试和点式防御机制中。本文提出MCPSHIELD——一个针对基于MCP的AI代理的综合性形式化安全框架。我们做出四项主要贡献：（1）一个层次化威胁分类体系，包含7个威胁类别和23种不同攻击向量，按四种攻击面组织，其基础是对17.7万余个MCP工具的分析；（2）一个基于带信任边界注释的标记转换系统的形式化验证模型，能够对MCP工具交互链进行静态和运行时分析；（3）对12种现有防御机制的系统性比较评估，识别出覆盖我们威胁分类体系中的缺口；以及（4）一个深度防御参考架构，集成基于能力的访问控制、加密工具认证、信息流追踪和运行时策略执行。我们的分析显示，没有任何单一现有防御能覆盖识别出的威胁景观超过34%，而MCPSHIELD的集成架构在理论上达到91%的覆盖率。我们进一步识别出七个必须解决的研究挑战，以保障下一代代理型AI系统的安全。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/