From Component Manipulation to System Compromise: Understanding and Detecting Malicious MCP Servers

The model context protocol (MCP) standardizes how LLMs connect to external tools and data sources, enabling faster integration but introducing new attack vectors. Despite the growing adoption of MCP, existing MCP security studies classify attacks by their observable effects, obscuring how attacks behave across different MCP server components and overlooking multi-component attack chains. Meanwhile, existing defenses are less effective when facing multi-component attacks or previously unknown malicious behaviors. This work presents a component-centric perspective for understanding and detecting malicious MCP servers. First, we build the first component-centric PoC dataset of 114 malicious MCP servers where attacks are achieved as manipulation over MCP components and their compositions. We evaluate these attacks' effectiveness across two MCP hosts and five LLMs, and uncover that (1) component position shapes attack success rate; and (2) multi-component compositions often outperform single-component attacks by distributing malicious logic. Second, we propose and implement Connor, a two-stage behavioral deviation detector for malicious MCP servers. It first performs pre-execution analysis to detect malicious shell commands and extract each tool's function intent, and then conducts step-wise in-execution analysis to trace each tool's behavioral trajectories and detect deviations from its function intent. Evaluation on our curated dataset indicates that Connor achieves an F1-score of 94.6%, outperforming the state of the art by 8.9% to 59.6%. In real-world detection, Connor identifies two malicious servers.

翻译：模型上下文协议（MCP）标准化了大语言模型与外部工具及数据源的连接方式，在加速集成的同时引入了新的攻击向量。尽管MCP的采用日益广泛，现有安全研究多依据可观察的攻击效果进行分类，掩盖了攻击在不同MCP服务器组件间的行为模式，且忽视了多组件攻击链。同时，现有防御机制在面对多组件攻击或未知恶意行为时效果有限。本文提出以组件为中心的视角来理解和检测恶意MCP服务器。首先，我们构建了首个基于组件的114个恶意MCP服务器概念验证数据集，其中攻击通过操控MCP组件及其组合实现。我们在两个MCP宿主和五个大语言模型上评估了这些攻击的有效性，发现：（1）组件位置影响攻击成功率；（2）多组件组合通过分散恶意逻辑通常优于单组件攻击。其次，我们提出并实现了Connor——一种面向恶意MCP服务器的两阶段行为偏离检测器。该检测器首先执行预执行分析，检测恶意shell命令并提取各工具的功能意图；随后在运行过程中逐步执行执行中分析，追踪各工具的行为轨迹并检测其与功能意图的偏离。在我们构建的数据集上评估表明，Connor的F1分数达94.6%，相比现有最优方法提升8.9%至59.6%。在真实环境检测中，Connor识别出两个恶意服务器。