MCP (Model Context Protocol) enables LLMs (Large Language Models) to interact with external tools and data sources via a standardized protocol. Its rapid adoption in tool-augmented Artificial Intelligence (AI) workflows has introduced new reliability challenges, such as configuration parameters that are accepted but not enforced at runtime, leading to unintended default behavior, whose runtime fault characteristics remain empirically unexamined. We present the first empirical taxonomy of runtime faults in MCP servers. We manually analyzed 837 MCP-specific runtime fault threads from 473 actively maintained MCP server GitHub repositories and derived a taxonomy using a bottom-up open coding procedure. The taxonomy comprises 11 top-level categories and 27 subcategories (73 leaf fault types), covering recurrent failures across protocol interactions, tool invocations, schema enforcement, state management, model-provider integration, security validation, and timeouts or explicit cancellations of in-progress operations. To assess the taxonomy's external validity, we surveyed 55 MCP server developers. Respondents reported experiencing an average of 20 of the 27 fault subcategories, and no category remained unobserved. These results indicate that the taxonomy reflects widely observed runtime failures in MCP-based systems and shall assist AI software maintenance and evolution in the future.
翻译:MCP(模型上下文协议)通过标准化协议使大语言模型(LLM)能够与外部工具和数据源交互。其在工具增强型人工智能工作流中的快速应用引入了新的可靠性挑战,例如配置参数虽被接受但未在运行时强制执行,导致非预期的默认行为——此类运行时故障特征尚未得到实证研究。我们首次提出了MCP服务器运行时故障的实证分类法。手动分析了来自473个活跃维护的MCP服务器GitHub仓库的837个MCP特定运行时故障线程,并采用自下而上的开放式编码程序推导出分类法。该分类法包含11个顶层类别和27个子类别(73个叶级故障类型),涵盖了协议交互、工具调用、模式强制、状态管理、模型提供商集成、安全验证以及运行中操作的超时或显式取消等环节的反复性失败。为评估分类法的外部有效性,我们调查了55位MCP服务器开发者。受访者报告平均经历了27个故障子类别中的20个,且无任何类别未被观测到。这些结果表明,该分类法反映了MCP系统中广泛观测到的运行时故障,并将有助于未来AI软件的维护与演进。