Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world applications called LLM-Integrated Applications. Multiple recent works showed that LLM-Integrated Applications are vulnerable to prompt injection attacks, in which an attacker injects malicious instruction/data into the input of those applications such that they produce results as the attacker desires. However, existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a general framework to formalize prompt injection attacks. Existing attacks, which are discussed in research papers and blog posts, are special cases in our framework. Our framework enables us to design a new attack by combining existing attacks. Moreover, we also propose a framework to systematize defenses against prompt injection attacks. Using our frameworks, we conduct a systematic evaluation on prompt injection attacks and their defenses with 10 LLMs and 7 tasks. We hope our frameworks can inspire future research in this field. Our code is available at https://github.com/liu00222/Open-Prompt-Injection.
翻译:大语言模型(LLMs)正越来越多地被部署为各类实际应用(即LLM集成应用)的后端。近期多项研究表明,LLM集成应用容易遭受提示注入攻击:攻击者通过向这些应用的输入中注入恶意指令或数据,使其生成攻击者期望的结果。然而,现有工作仅限于案例研究,导致相关文献中缺乏对提示注入攻击及其防御的系统性理解。本研究旨在填补这一空白。具体而言,我们提出了一个通用框架来形式化定义提示注入攻击——现有研究论文和博客中讨论的攻击方式均为该框架的特例。该框架使我们能够通过组合现有攻击设计新型攻击方法。此外,我们还提出了一个系统化防御提示注入攻击的框架。基于这些框架,我们在10个大语言模型和7项任务上对提示注入攻击及其防御进行了系统评估。我们希望这些框架能够启发该领域的未来研究。我们的代码开源地址为:https://github.com/liu00222/Open-Prompt-Injection。