AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, θ=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.
翻译:AI智能体通过MCP和A2A在运行时动态获取能力,但目前尚无框架能检测授权后能力变更。我们将此问题称为能力-身份鸿沟:它可能导致静默能力提升,并违反欧盟《人工智能法案》的可追溯性要求。我们提出三种机制。能力绑定智能体证书通过扩展X.509 v3标准,引入技能清单哈希值,任何工具变更都会使证书失效。可复现性承诺利用大语言模型推理的近似确定性实现事后重放验证。可验证交互账本提供哈希链接的签名记录,支持多智能体取证重建。我们形式化了九项安全属性,并在现实对抗模型下证明其成立。我们的Rust原型实现达到:97微秒证书验证(能力绑定开销<1纳秒,比BAID的zkVM快约120万倍),每次工具调用总治理开销0.62毫秒(占典型延迟的0.1%-1.2%),跨供应商输出分离度达4.7倍(四项指标的科恩d值均>1.0),最佳分类性能F1=0.876(杰卡德相似度θ=0.408);单供应商部署可实现F1=0.990及11.5倍分离度。我们评估了12种攻击场景——包括静默提升、工具木马化、幽灵委托、证据篡改、共谋以及基于英伟达Nemotron-AIQ轨迹验证的运行时行为攻击——每种攻击均可通过相应机制检测并追溯,而MCP+OAuth 2.1基线方案均无法检测。在包含5至20个智能体且调用真实大语言模型的端到端流水线评估中,完整治理方案(G1-G3)每次流水线运行增加约10.8毫秒开销(0.12%额外负担),随智能体数量呈次线性增长,并能以零误报率检测全部五种原位攻击。