Model Context Protocol (MCP) is a rapidly adopted standard for defining and invoking external tools in LLM applications. The multi-layered architecture of MCP introduces new attack surfaces such as tool poisoning, in addition to traditional prompt injection. Existing defense systems suffer from limitations including high false positive rates, API dependency, or white-box access requirements. In this study, we propose CASCADE, a three-tiered cascaded defense architecture for MCP-based systems: (i) Layer 1 performs fast pre-filtering using regex, phrase weighting, and entropy analysis; (ii) Layer 2 conducts semantic analysis via BGE embedding with an Ollama Llama3 fallback mechanism; (iii) Layer 3 applies pattern-based output filtering. Evaluation on a dataset of 5,000 samples yielded 95.85% precision, 6.06% false positive rate, 61.05% recall, and 74.59% F1-score. Analysis across 31 attack types categorized into 6 tiers revealed high detection rates for data exfiltration (91.5%) and prompt injection (84.2%), while semantic attack (52.5%) and tool poisoning (59.9%) categories showed potential for improvement. A key advantage of CASCADE over existing solutions is its fully local operation, requiring no external API calls
翻译:模型上下文协议(MCP)是一种在大型语言模型(LLM)应用中快速被采纳的、用于定义和调用外部工具的标准。除了传统的提示注入外,MCP的多层架构还引入了新的攻击面,例如工具投毒。现有防御系统存在局限,包括高误报率、API依赖性或需要白盒访问。在本研究中,我们提出了CASCADE——一种用于基于MCP系统的三层级联防御架构:(i)第一层利用正则表达式、短语加权和熵分析执行快速预过滤;(ii)第二层通过BGE嵌入结合Ollama Llama3回退机制进行语义分析;(iii)第三层应用基于模式的输出过滤。在包含5000个样本的数据集上评估显示,精确率为95.85%,误报率为6.06%,召回率为61.05%,F1分数为74.59%。针对划分为6个层级的31种攻击类型的分析表明,数据泄露(91.5%)和提示注入(84.2%)的检出率较高,而语义攻击(52.5%)和工具投毒(59.9%)类别仍有改进空间。CASCADE相较于现有方案的关键优势在于其完全本地化运行,无需任何外部API调用。