Governed MCP: Kernel-Level Tool Governance for AI Agents via Logit-Based Safety Primitives

AI agents increasingly call external tools (file system, network, APIs) through the Model Context Protocol (MCP). These tool calls are the agent's syscalls -- privileged operations with side effects on shared state -- yet today's safety enforcement lives entirely in userspace, where a 10-line script can bypass it. I propose Governed MCP, a kernel-resident tool governance gateway built on a logit-based safety primitive (ProbeLogits, companion paper: arXiv:2604.11943). The gateway interposes on every MCP tool call in a 6-layer pipeline: schema validation, trust tier check, rate limit, adversarial pre-filter, ProbeLogits gate (the load-bearing semantic check), and constitutional policy match, with a Blake3-hashed audit chain. I implement Governed MCP in Anima OS, a bare-metal x86_64 OS in approximately 86,000 lines of Rust. The five non-inference layers add 65.3 microseconds of overhead per call; ProbeLogits adds 65 ms (per-token-class semantic decision) on 7B Q4_0. A 4-config ablation on a 101-prompt MCP-domain benchmark shows that removing the ProbeLogits layer collapses F1 from 0.773 to 0.327 (Delta F1 = -0.446) -- hand-rule firewalling alone is insufficient. All 15 WASM-to-system host functions in the runtime route through the gateway (complete mediation of the WASM ABI surface; the scope and caveats of this claim are stated in Section 4.6); a 10-LoC userspace bypass that defeats existing guardrail libraries is structurally impossible against the kernel-resident gate.

翻译：AI智能体越来越多地通过模型上下文协议（MCP）调用外部工具（文件系统、网络、API）。这些工具调用相当于智能体的系统调用——对共享状态产生副作用的高特权操作——然而当前的安全防护完全运行在用户空间，仅需10行脚本即可绕过。本文提出治理的MCP（Governed MCP），一种基于对数概率安全原语（ProbeLogits，配套论文：arXiv:2604.11943）的内核驻留工具治理网关。该网关通过六层流水线对每次MCP工具调用进行插桩：模式验证、信任层级检查、速率限制、对抗性预过滤、ProbeLogits门控（承担语义检查的主体）以及宪法策略匹配，并附带Blake3哈希审计链。我们在Anima OS中实现了治理的MCP，这是一个约86,000行Rust代码的裸机x86_64操作系统。五个非推理层每次调用增加65.3微秒开销；在7B Q4_0模型上ProbeLogits增加65毫秒（逐令牌类别语义决策）。在包含101个提示的MCP领域基准测试中，一项四配置消融实验表明：移除ProbeLogits层后F1值从0.773骤降至0.327（Delta F1 = -0.446）——仅靠手工规则防火墙远远不够。运行时环境中所有15个WASM到系统的主机函数均通过该网关路由（实现对WASM ABI接口的完全中介；该声明的适用范围与局限详见第4.6节）；一段10行代码的用户空间绕过代码能攻破现有防护库，但在结构上无法对抗内核驻留网关。