As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.
翻译:随着基于大语言模型(LLM)的人工智能智能体日益多样化,并能够处理广泛的任务,确保其安全性已成为一项关键挑战。其中最紧迫的威胁之一是提示词注入攻击,这种攻击利用了智能体对自然语言输入的依赖——当智能体被授予工具访问权限或处理敏感信息时,这尤其危险。在本研究中,我们提出了一套原则性的设计模式,用于构建可证明能抵抗提示词注入攻击的人工智能智能体。我们系统地分析了这些模式,讨论了它们在效用与安全性之间的权衡,并通过一系列案例研究阐明了它们在现实世界中的适用性。