From Tool Orchestration to Code Execution: A Study of MCP Design Choices

Model Context Protocols (MCPs) provide a unified platform for agent systems to discover, select, and orchestrate tools across heterogeneous execution environments. As MCP-based systems scale to incorporate larger tool catalogs and multiple concurrently connected MCP servers, traditional tool-by-tool invocation increases coordination overhead, fragments state management, and limits support for wide-context operations. To address these scalability challenges, recent MCP designs have incorporated code execution as a first-class capability, an approach called Code Execution MCP (CE-MCP). This enables agents to consolidate complex workflows, such as SQL querying, file analysis, and multi-step data transformations, into a single program that executes within an isolated runtime environment. In this work, we formalize the architectural distinction between context-coupled (traditional) and context-decoupled (CE-MCP) models, analyzing their fundamental scalability trade-offs. Using the MCP-Bench framework across 10 representative servers, we empirically evaluate task behavior, tool utilization patterns, execution latency, and protocol efficiency as the scale of connected MCP servers and available tools increases, demonstrating that while CE-MCP significantly reduces token usage and execution latency, it introduces a vastly expanded attack surface. We address this security gap by applying the MAESTRO framework, identifying sixteen attack classes across five execution phases-including specific code execution threats such as exception-mediated code injection and unsafe capability synthesis. We validate these vulnerabilities through adversarial scenarios across multiple LLMs and propose a layered defense architecture comprising containerized sandboxing and semantic gating. Our findings provide a rigorous roadmap for balancing scalability and security in production-ready executable agent workflows.

翻译：模型上下文协议（MCP）为智能体系统提供了一个统一平台，使其能够在异构执行环境中发现、选择和编排工具。随着基于MCP的系统扩展至包含更大规模工具目录和多个并发连接的MCP服务器，传统的逐工具调用方式增加了协调开销、分散了状态管理，并限制了对宽上下文操作的支持。为应对这些可扩展性挑战，近期的MCP设计已将代码执行纳入一等能力，这种方法称为代码执行MCP（CE-MCP）。这使得智能体能够将复杂工作流（如SQL查询、文件分析和多步骤数据转换）整合为在隔离运行时环境中执行的单一程序。本研究形式化地界定了上下文耦合（传统）模型与上下文解耦（CE-MCP）模型在架构上的本质区别，并分析其根本性的可扩展性权衡。通过MCP-Bench框架在10个代表性服务器上进行实证评估，我们系统测量了任务行为、工具使用模式、执行延迟和协议效率随连接MCP服务器规模及可用工具数量增长的变化，证明CE-MCP虽能显著降低令牌使用量和执行延迟，却引入了急剧扩大的攻击面。我们通过应用MAESTRO框架填补这一安全缺口，识别出跨越五个执行阶段的十六类攻击——包括异常介导的代码注入和不安全能力合成等特定代码执行威胁。通过在多个人工智能大语言模型上构建对抗性场景验证这些漏洞，并提出包含容器化沙箱与语义门控的分层防御架构。本研究为生产级可执行智能体工作流中可扩展性与安全性的平衡提供了严谨的技术路线图。