MCPXKIT: The Unified Toolkit for Analyzing Model Context Protocol Security

from arxiv, Accepted by IEEE Transactions on Dependable and Secure Computing (TDSC). $\href{https://ieeexplore.ieee.org/abstract/document/11531012}{Official \ version}$

The Model Context Protocol (MCP) has emerged as a universal standard that enables AI agents to seamlessly connect with external tools, significantly enhancing their functionality. However, while MCP brings notable benefits, it also introduces significant vulnerabilities, such as Tool Poisoning Attacks (TPA), where hidden malicious instructions exploit the sycophancy of large language models (LLMs) to manipulate agent behavior. Despite these risks, current academic research on MCP security remains limited, with most studies focusing on narrow or qualitative analyses that fail to capture the diversity of real-world threats. To address this gap, we present the MCP eXploit Toolkit (MCPXKIT), which categorizes and implements 31 distinct attack methods under four key classifications: direct tool injection, indirect tool injection, malicious user attacks, and LLM inherent attack. We further conduct a quantitative analysis of the efficacy of each attack. Our experiments reveal key insights into MCP vulnerabilities, including agents' blind reliance on tool descriptions, sensitivity to file-based attacks, chain attacks exploiting shared context, and difficulty distinguishing external data from executable commands. These insights, validated through attack experiments, underscore the urgency for robust defense strategies and informed MCP design. Our contributions include 1) constructing a comprehensive MCP attack taxonomy, 2) introducing a unified attack framework, MCPXKIT, and 3) conducting empirical vulnerability analysis to enhance MCP security mechanisms. This work provides a foundational framework, supporting the secure evolution of MCP ecosystems.

翻译：模型上下文协议（MCP）已成为一种通用标准，使AI智能体能够无缝连接外部工具，从而显著增强其功能。然而，MCP在带来显著优势的同时，也引入了重大安全漏洞，例如工具投毒攻击（TPA），其中隐藏的恶意指令利用大型语言模型（LLM）的谄媚倾向来操纵智能体行为。尽管存在这些风险，当前关于MCP安全的学术研究仍十分有限，大多数研究聚焦于狭窄或定性分析，未能捕捉真实世界中威胁的多样性。为填补这一空白，我们提出了MCP攻击工具包（MCPXKIT），该工具在四类关键分类下实现并分类了31种不同攻击方法：直接工具注入、间接工具注入、恶意用户攻击以及LLM固有攻击。我们进一步对每种攻击的有效性进行了定量分析。实验揭示了关于MCP漏洞的关键见解，包括智能体对工具描述的盲目依赖、对基于文件攻击的敏感性、利用共享上下文的链式攻击，以及区分外部数据与可执行命令的困难。这些通过攻击实验验证的见解，突显了制定稳健防御策略与优化MCP设计的紧迫性。我们的贡献包括：1）构建全面的MCP攻击分类体系，2）引入统一攻击框架MCPXKIT，3）进行实证漏洞分析以增强MCP安全机制。本研究为支持MCP生态系统的安全演进提供了基础性框架。