Large Language Models (LLMs) are combined with plugins to create powerful LLM agents that provide a wide range of services. Unlike traditional software, LLM agent's behavior is determined at runtime by natural language prompts from either user or plugin's data. This flexibility enables a new computing paradigm with unlimited capabilities and programmability, but also introduces new security risks, vulnerable to privilege escalation attacks. Moreover, user prompt is prone to be interpreted in an insecure way by LLM agents, creating non-deterministic behaviors that can be exploited by attackers. To address these security risks, we propose Prompt Flow Integrity (PFI), a system security-oriented solution to prevent privilege escalation in LLM agents. Analyzing the architectural characteristics of LLM agents, PFI features three mitigation techniques -- i.e., untrusted data identification, enforcing least privilege on LLM agents, and validating unsafe data flows. Our evaluation result shows that PFI effectively mitigates privilege escalation attacks while successfully preserving the utility of LLM agents.
翻译:大型语言模型(LLM)与插件相结合,形成了功能强大的LLM代理,可提供广泛的服务。与传统软件不同,LLM代理的行为在运行时由来自用户或插件数据的自然语言提示决定。这种灵活性催生了一种具有无限能力和可编程性的新型计算范式,但也引入了新的安全风险,使其易受权限提升攻击。此外,用户提示容易被LLM代理以不安全的方式解释,产生非确定性行为,可能被攻击者利用。为应对这些安全风险,我们提出了提示流完整性(PFI),这是一种面向系统安全的解决方案,旨在防止LLM代理中的权限提升。通过分析LLM代理的架构特性,PFI具备三项缓解技术——即不可信数据识别、对LLM代理实施最小权限原则以及验证不安全数据流。我们的评估结果表明,PFI能有效缓解权限提升攻击,同时成功保持LLM代理的实用性。