Caging the Agents: A Zero Trust Security Architecture for Autonomous AI in Healthcare

from arxiv, Keywords: agentic AI security, autonomous agents, healthcare cybersecurity, zero trust, prompt injection, HIPAA, Kubernetes security, OpenClaw

Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.

翻译：基于大语言模型的自主AI智能体正被部署于生产环境，其能力涵盖Shell执行、文件系统访问、数据库查询及多方通信。近期红队研究表明，这些智能体在真实场景中表现出严重漏洞：非所有者指令的未授权遵从、敏感信息泄露、身份欺骗、不安全实践的跨智能体传播，以及通过外部资源实现的间接提示注入[7]。在处理受保护健康信息的医疗环境中，每项漏洞都可能构成HIPAA违规。本文提出一种已在某医疗科技公司为九个生产环境自主AI智能体部署的安全架构。我们构建了涵盖六大领域的医疗AI智能体威胁模型：凭证暴露、执行能力滥用、网络出口数据窃取、提示完整性失效、数据库访问风险及集群配置漂移。我们实施四层纵深防御：(1) 基于Kubernetes gVisor的内核级工作负载隔离，(2) 通过凭证代理边车阻止智能体容器访问原始密钥，(3) 网络出口策略将各智能体限制于许可名单目标，(4) 采用结构化元数据封装与不可信内容标记的提示完整性框架。我们报告了90天部署结果：自动化安全审计智能体发现并修复四项高危问题，历经三代虚拟机镜像的渐进式集群加固，以及覆盖近期文献全部十一种攻击模式的防御映射。所有配置、审计工具及提示完整性框架均已开源发布。

相关内容

关注 7106

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《杀伤链中人类判断的终结？论AI智能体对主动权与解释权的重置》

专知会员服务

12+阅读 · 4月22日

《军用自主人工智能系统的治理与安全》

专知会员服务

15+阅读 · 4月21日

《军用AI智能体的治理框架》最新报告

专知会员服务

33+阅读 · 3月8日

保护网络物理系统中的 AI 智能体：关于环境交互、深度伪造威胁及其防御技术的综述

专知会员服务

10+阅读 · 2月15日