AI agents dynamically acquire capabilities at runtime via MCP and A2A, yet no framework detects when capabilities change post-authorization. We term this the capability-identity gap}: it enables silent capability escalation and violates EU AI Act traceability requirements. We propose three mechanisms. Capability-bound agent certificates extend X.509 v3 with a skills manifest hash; any tool change invalidates the certificate. Reproducibility commitments leverage LLM inference near-determinism for post-hoc replay verification. A verifiable interaction ledger provides hash-linked, signed records for multi-agent forensic reconstruction. We formalize nine security properties and prove they hold under a realistic adversary model. Our Rust prototype achieves 97us certificate verification (<1ns capability binding overhead, ~1,200,000 faster than BAID's zkVM), 0.62ms total governance overhead per tool call (0.1--1.2% of typical latency), and 4.7X separation from cross-provider outputs (Cohen's d > 1.0 on all four metrics), with best classification at F_1=0.876 (Jaccard, θ=0.408); single-provider deployments achieve F_1=0.990 with 11.5 times separation. We evaluate 12 attack scenarios -- silent escalation, tool trojanization, phantom delegation, evidence tampering, collusion, and runtime behavioral attacks validated against NVIDIA's Nemotron-AIQ traces -- each detected with a traceable mechanism, while the MCP+OAuth 2.1 baseline detects none. An end-to-end evaluation over a 5-to-20-agent pipeline with real LLM calls confirms that full governance (G1--G3) adds ~10.8ms per pipeline run (0.12% overhead), scales sub-linearly per agent, and detects all five in-situ attacks with zero false positives.
翻译:暂无翻译