When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI

Agentic AI systems, specifically LLM-driven agents that plan, invoke tools, maintain persistent memory, and delegate tasks to peer agents via protocols such as MCP and A2A, introduce a threat surface that differs materially from standalone model inference. Agents accumulate sensitive context, hold credentials, and operate across pipelines no single party fully controls, enabling prompt injection, context exfiltration, credential theft, and inter-agent message poisoning. Current defenses operate entirely within the software stack and can be silently bypassed by a sufficiently privileged adversary such as a compromised cloud operator. Confidential computing (CC) offers a hardware-rooted alternative: Trusted Execution Environments (TEEs) isolate agent code and data from privileged system software, while remote attestation enables verifiable trust across distributed deployments. This survey synthesizes the design space in four parts: (i) a unified taxonomy of six TEE platforms (Intel SGX, Intel TDX, AMD SEV-SNP, ARM TrustZone, ARM CCA, and NVIDIA H100 CC) covering deployment roles and performance tradeoffs; (ii) an agent-centric threat model spanning perception, planning, memory, action, and coordination layers mapped to nine security goals; (iii) a comparative survey of CC-based defenses distinguishing findings that transfer from single-call inference versus what requires new agentic designs; and (iv) six open challenges including compound attestation for multi-hop agent chains and GPU-TEE performance at LLM scale. While several hardware trust primitives appear mature enough for targeted deployments, no broadly established end-to-end framework yet binds them into a coherent security substrate for production agentic AI.

翻译：智能体AI系统，特别是由大语言模型驱动的智能体——它们能规划任务、调用工具、维护持久化内存，并通过MCP和A2A等协议将任务委派给同级智能体——引入了一个与独立模型推理截然不同的威胁面。这些智能体会积累敏感上下文、持有凭证，并在无单一主体完全控制的流水线中运行，从而引发提示注入、上下文窃取、凭证盗窃以及智能体间消息投毒等风险。当前的防御机制完全运行在软件栈内，且可能被具备足够权限的攻击者（如被攻陷的云运营商）静默绕过。机密计算提供了一种基于硬件的替代方案：可信执行环境将智能体代码与数据隔离于高权限系统软件之外，而远程证明则能在分布式部署中实现可验证的信任。本综述从四个维度系统梳理了设计空间：(i) 统一分类六大TEE平台（Intel SGX、Intel TDX、AMD SEV-SNP、ARM TrustZone、ARM CCA及NVIDIA H100 CC），涵盖部署角色与性能权衡；(ii) 提出以智能体为中心的威胁模型，覆盖感知、规划、记忆、动作与协调层，并映射至九项安全目标；(iii) 对比分析基于机密计算的防御方案，区分可直接迁移自单次调用推理的成果与需全新智能体化设计的方案；(iv) 提出六大开放性挑战，包括面向多跳智能体链的复合证明机制以及大语言模型规模下的GPU-TEE性能问题。尽管多项硬件信任原语在针对性部署场景中已显成熟，目前仍缺乏广泛认可的端到端框架将其整合为生产环境智能体AI的连贯安全基座。