Autonomous AI agents powered by large language models are being deployed in production with capabilities including shell execution, file system access, database queries, and multi-party communication. Recent red teaming research demonstrates that these agents exhibit critical vulnerabilities in realistic settings: unauthorized compliance with non-owner instructions, sensitive information disclosure, identity spoofing, cross-agent propagation of unsafe practices, and indirect prompt injection through external resources [7]. In healthcare environments processing Protected Health Information, every such vulnerability becomes a potential HIPAA violation. This paper presents a security architecture deployed for nine autonomous AI agents in production at a healthcare technology company. We develop a six-domain threat model for agentic AI in healthcare covering credential exposure, execution capability abuse, network egress exfiltration, prompt integrity failures, database access risks, and fleet configuration drift. We implement four-layer defense in depth: (1) kernel level workload isolation using gVisor on Kubernetes, (2) credential proxy sidecars preventing agent containers from accessing raw secrets, (3) network egress policies restricting each agent to allowlisted destinations, and (4) a prompt integrity framework with structured metadata envelopes and untrusted content labeling. We report results from 90 days of deployment including four HIGH severity findings discovered and remediated by an automated security audit agent, progressive fleet hardening across three VM image generations, and defense coverage mapped to all eleven attack patterns from recent literature. All configurations, audit tooling, and the prompt integrity framework are released as open source.
翻译:基于大语言模型的自主AI智能体正被部署于生产环境,其能力涵盖Shell执行、文件系统访问、数据库查询及多方通信。近期红队研究表明,这些智能体在真实场景中表现出严重漏洞:非所有者指令的未授权遵从、敏感信息泄露、身份欺骗、不安全实践的跨智能体传播,以及通过外部资源实现的间接提示注入[7]。在处理受保护健康信息的医疗环境中,每项漏洞都可能构成HIPAA违规。本文提出一种已在某医疗科技公司为九个生产环境自主AI智能体部署的安全架构。我们构建了涵盖六大领域的医疗AI智能体威胁模型:凭证暴露、执行能力滥用、网络出口数据窃取、提示完整性失效、数据库访问风险及集群配置漂移。我们实施四层纵深防御:(1) 基于Kubernetes gVisor的内核级工作负载隔离,(2) 通过凭证代理边车阻止智能体容器访问原始密钥,(3) 网络出口策略将各智能体限制于许可名单目标,(4) 采用结构化元数据封装与不可信内容标记的提示完整性框架。我们报告了90天部署结果:自动化安全审计智能体发现并修复四项高危问题,历经三代虚拟机镜像的渐进式集群加固,以及覆盖近期文献全部十一种攻击模式的防御映射。所有配置、审计工具及提示完整性框架均已开源发布。