From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1) component position shapes attack success rate; and (2) multi-component compositions often outperform single-component attacks by distributing malicious logic. Second, we propose and implement Connor, a two-stage behavioral deviation detector for malicious MCP servers. It first performs pre-execution analysis to detect malicious shell commands and extract each tool's function intent, and then conducts step-wise in-execution analysis to trace each tool's behavioral trajectories and detect deviations from its function intent. Evaluation on our curated dataset indicates that Connor achieves an F1-score of 94.6%, outperforming the state of the art by 8.9% to 59.6%. In real-world detection, Connor identifies two malicious servers.

翻译：模型上下文协议（MCP）规范了大型语言模型（LLM）连接外部工具与数据源的方式，在实现快速集成的同時也引入了新的攻击向量。尽管MCP的应用日益广泛，现有MCP安全研究主要根据攻击的可观察效应进行分类，这掩盖了攻击在不同MCP服务器组件间的具体行为模式，并忽视了多组件攻击链的存在。同时，现有防御机制在面对多组件攻击或前所未见的恶意行为时效果有限。本文提出了以组件为中心的恶意MCP服务器理解与检测视角。首先，我们构建了首个基于114个恶意MCP服务器的组件中心概念验证数据集，其中攻击通过操纵MCP组件及其组合实现。我们评估了这些攻击在两个MCP主机和五个LLM上的有效性，发现：（1）组件位置影响攻击成功率；（2）多组件组合通过分散恶意逻辑往往优于单组件攻击。其次，我们提出并实现了Connor——一种针对恶意MCP服务器的两阶段行为偏差检测器。该检测器首先执行预执行分析以检测恶意shell命令并提取每个工具的功能意图，随后进行逐步的执行中分析以追踪每个工具的行为轨迹并检测与功能意图的偏差。在我们整理的数据集上的评估表明，Connor的F1分数达到94.6%，比现有最优方法高出8.9%至59.6%。在真实场景检测中，Connor成功识别出两个恶意服务器。